I’m using csv module to read parse 76,000 rows of patient data in a CSV
file. I use the below line to read in the file and loop through the
rows.
CSV.open("patientfile.txt", "r") do |row|
When I get to a row like below the script blows up:
/usr/local/lib/ruby/1.8/csv.rb:639:in get_row': CSV::IllegalFormatError (CSV::IllegalFormatError) from /usr/local/lib/ruby/1.8/csv.rb:556:in
each’
from /usr/local/lib/ruby/1.8/csv.rb:531:in parse' from /usr/local/lib/ruby/1.8/csv.rb:311:in
open_reader’
from /usr/local/lib/ruby/1.8/csv.rb:85:in `open’
from sync.rb:1
The row is similar to below. Note the embedded “B” within the address
field.
“M1234567”,“John”,“A”,“Doe”,“321 NORTH “B”
ST”,"",“Sometown”,“ST”,“55555”
Is there a way to get around this error and escape the “B” properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
On 5/5/06, Sean C. removed_email_address@domain.invalid wrote:
Is there a way to get around this error and escape the “B” properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
It’s just a guess, but maybe you could try replacing every
double-quote character that isn’t either preceded or followed by a
comma with a single quote? Something like the untested code below:
line.gsub(/[^,]“[^,]/,”'")
It would probably require reading the whole file first, writing out a
corrected version, and then calling the CSV methods on that, but it
beats doing it by hand :).
On May 5, 2006, at 11:48 AM, Sean C. wrote:
“M1234567”,“John”,“A”,“Doe”,“321 NORTH “B”
ST”,"",“Sometown”,“ST”,“55555”
Well, the long and the short of this story is that the above line is
not valid CSV. Gotta fix that somehow: by hand, with a
preprocessor, or by fixing the broken software that spit it out. 
James Edward G. II
Sean C. wrote:
…
“M1234567”,“John”,“A”,“Doe”,“321 NORTH “B”
ST”,"",“Sometown”,“ST”,“55555”
Is there a way to get around this error and escape the “B” properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
Escape quotes by doubling them:
gsub(’“B”’, ‘"“B”"’)
Cheers,
Dave
Bira wrote:
line.gsub(/[^,]"[^,]/,"’")
Bira, I’m testing your idea with the below script but I’m having
problems. Thanks for the start though.
TEST PROGRAM:
line = ““NAME”,“610 “A” STREET”,“STATE”,“POSTAL_CODE””
puts line
if double quote not preceeded by a comman and not followed
by a comma, then replace the quotation with a single quote.
new_line = line.gsub(/[^,]"[^,]/,"’")
puts new_line
OUTPUT:
“NAME”,“610 “A” STREET”,“STATE”,“POSTAL_CODE”
“NAME”,“610’” STREET",“STATE”,“POSTAL_CODE”
Any ideas regular expression masters?