All -
I’ve written a script to split a .csv file into smaller .csv files of
40,000 lines each. The intent here is to break the file down enough so
that excel does not have issues reading each chunk. My code takes a
filename from the command line and breaks it down as so:
infile -> xyz.csv
output -> xyz_part_1.csv
xyz_part_2.csv
etc…
My code is working but I don’t find it very “rubyish”. In particular, I
hate having my index and counter counters and I don’t like that I had to
declare my header variable outside of the loop. Bear in mind here that I
can not do something like “rows = CSV.open(infile)” because ruby will
yell and error as the input file is too big (250 mb). Any advice on
making the code nicer is appreciated. The current code is as follows:
require ‘csv’
infile = ARGV[0] if ARGV[0] != nil
counter = 1
index = 0
header = “”
writer = CSV.open(infile.gsub(/./,“part”+counter.to_s+"."),‘w’)
CSV.open(infile, ‘r’) do |row|
if(index != 0 && index%40000 == 0)
writer.close
counter+=1
writer = CSV.open(infile.gsub(/./,“part”+counter.to_s+"."),‘w’)
writer << header
end
if (index == 0)
header = row
end
writer << row
index += 1
end
writer.close()