Parsing CSV file with ruby

dfg59 · August 30, 2006, 5:57pm

I’m currently trying to do something that seems rather simple but I’m
slightly new to ruby. I want to read in a cvs file, find rows that are
distinct with respect to one of the elements in the row (for example,
all rows in which the first element is “A”) and then do something with
these rows (in this case, parse them, build some XML and write it to a
file). I’m not familiar enough with iterators in ruby but I seem to
remember there being functionality that will allow me to get distinct
rows based on some element in the row. Let me know if this is possible
and how I should approach it.

Thanks,
Drew

dfg59 · August 30, 2006, 6:01pm

Let me be more specific: essentially I want to find the groups of rows
that share an element. Let’s say each row in my CVS doc has 3 elements.
I want to iterate across every group of rows that share the same value
for the first element. Hope this makes sense.

dfg59 · August 30, 2006, 6:59pm

Drew O. wrote:

Let me be more specific: essentially I want to find the groups of rows
that share an element. Let’s say each row in my CVS doc has 3 elements.
I want to iterate across every group of rows that share the same value
for the first element. Hope this makes sense.

#!/usr/bin/ruby -w

row_hash = {}

File.open(“data.txt”).each { |record|
fields = record.split(",")
row_hash[fields.first] = [] unless row_hash[fields.first]
row_hash[fields.first] << record
}

row_hash.keys.sort.each { |key|
puts “Group: #{key}”
row_hash[key].each { |record|
puts “\t#{record}”
}
}

data.txt:

a,this,is,one,record
a,this,is,another,record
b,this,is,one,record
b,this,is,another,record
c,this,is,one,record
c,this,is,another,record

output:

Group: a
a,this,is,one,record
a,this,is,another,record
Group: b
b,this,is,one,record
b,this,is,another,record
Group: c
c,this,is,one,record
c,this,is,another,record

dfg59 · August 30, 2006, 8:47pm

Thank you both for the responses. Both seem to be EXTREMELY helpful.
I’ll be sure to post issues I have in the form in the future.

Thanks,
Drew

dfg59 · August 30, 2006, 7:07pm

On Aug 30, 2006, at 11:01 AM, Drew O. wrote:

Let me be more specific: essentially I want to find the groups of rows
that share an element. Let’s say each row in my CVS doc has 3
elements.
I want to iterate across every group of rows that share the same value
for the first element. Hope this makes sense.

I’m assuming you meant CSV (not CVS).

See if this gets you going:

Firefly:~/Desktop$ cat data.csv
one,1,A
one,2,B
one,3,C
two,1,A
two,2,B
three,1,A
Firefly:~/Desktop$ irb -r csv

rows = CSV.read(“data.csv”)
=> [[“one”, “1”, “A”], [“one”, “2”, “B”], [“one”, “3”, “C”], [“two”,
“1”, “A”], [“two”, “2”, “B”], [“three”, “1”, “A”]]

groups = rows.map { |row| row.first }.uniq
=> [“one”, “two”, “three”]

groups.each do |group|
?> puts group

rows.select { |row| row.first == group }.each { |row| puts " #
{row.inspect}" }

end
one
[“one”, “1”, “A”]
[“one”, “2”, “B”]
[“one”, “3”, “C”]
two
[“two”, “1”, “A”]
[“two”, “2”, “B”]
three
[“three”, “1”, “A”]
=> [“one”, “two”, “three”]

James Edward G. II