2016-04-06 58 views
7

Ho un file csv di circa 5000 righe in python, voglio dividerlo in cinque file.dividere un csv in più file in python

Ho scritto un codice per esso, ma non sta funzionando

import codecs 
import csv 
NO_OF_LINES_PER_FILE = 1000 
def again(count_file_header,count): 
    f3 = open('write_'+count_file_header+'.csv', 'at') 
    with open('import_1458922827.csv', 'rb') as csvfile: 
     candidate_info_reader = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_ALL) 
     co = 0  
     for row in candidate_info_reader: 
      co = co + 1 
      count = count + 1 
      if count <= count: 
       pass 
      elif count >= NO_OF_LINES_PER_FILE: 
       count_file_header = count + NO_OF_LINES_PER_FILE 
       again(count_file_header,count) 
      else: 
       writer = csv.writer(f3,delimiter = ',', lineterminator='\n',quoting=csv.QUOTE_ALL) 
       writer.writerow(row) 

def read_write(): 
    f3 = open('write_'+NO_OF_LINES_PER_FILE+'.csv', 'at') 
    with open('import_1458922827.csv', 'rb') as csvfile: 


     candidate_info_reader = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_ALL) 

     count = 0  
     for row in candidate_info_reader: 
      count = count + 1 
      if count >= NO_OF_LINES_PER_FILE: 
       count_file_header = count + NO_OF_LINES_PER_FILE 
       again(count_file_header,count) 
      else: 
       writer = csv.writer(f3,delimiter = ',', lineterminator='\n',quoting=csv.QUOTE_ALL) 
       writer.writerow(row) 

read_write() 

Il codice precedente crea molti fileswith contenuto vuoto.

Come dividere un file in cinque file csv?

risposta

6

io suggerisco di non inventare una ruota. C'è una soluzione esistente. Fonte here

import os 


def split(filehandler, delimiter=',', row_limit=1000, 
      output_name_template='output_%s.csv', output_path='.', keep_headers=True): 
    import csv 
    reader = csv.reader(filehandler, delimiter=delimiter) 
    current_piece = 1 
    current_out_path = os.path.join(
     output_path, 
     output_name_template % current_piece 
    ) 
    current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) 
    current_limit = row_limit 
    if keep_headers: 
     headers = reader.next() 
     current_out_writer.writerow(headers) 
    for i, row in enumerate(reader): 
     if i + 1 > current_limit: 
      current_piece += 1 
      current_limit = row_limit * current_piece 
      current_out_path = os.path.join(
       output_path, 
       output_name_template % current_piece 
      ) 
      current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) 
      if keep_headers: 
       current_out_writer.writerow(headers) 
     current_out_writer.writerow(row) 

utilizzarlo come:

split(open('/your/pat/input.csv', 'r')); 
+1

se riga vuota tra le righe è un problema. basta sostituire "w" con "wb" nell'oggetto di scrittura file. –

+0

Usa next (reader) invece di reader.next() per Python3 –

7

In Python

Usa readlines() e writelines() per farlo, ecco un esempio:

>>> csvfile = open('import_1458922827.csv', 'r').readlines() 
>>> filename = 1 
>>> for i in range(len(csvfile)): 
...  if i % 1000 == 0: 
...   open(str(filename) + '.csv', 'w+').writelines(csvfile[i:i+1000]) 
...   filename += 1 

l'uscita f nomi ile saranno numerate 1.csv, 2.csv, ... ecc

Dal terminal

FYI, è possibile farlo da riga di comando utilizzando split come segue:

$ split -l 1000 import_1458922827.csv 
+0

molto bello, grazie –

+0

benvenuto :-) grazie. –

+0

Cosa farai con file con lunghezza 5003. Ti mancheranno le ultime 3 righe? –