I’ve just finished an extensive reworking of the standard CSV library
in Ruby 1.9 (formerly FasterCSV). CSV’s parser and generator are now
m17n aware. This means they should work naturally with your data in
any non-“dummy” Encoding Ruby 1.9 supports.
Everything is documented so it should be pretty easy to figure out how
to use the new system, but generally you just set the Encoding for
your IO or String objects correctly and CSV should do the rest:
reading example
CSV.foreach(…, :encoding => “…”) do |row|
# row will be parsed but not transcoded here
end
writing example
CSV.open(…, “wb:…”) do |csv|
csv << data
# data will be quoted and separated with characters
# in the proper encoding
end
Encodings default to Encoding.default_external if not provided.
I had to change quite a bit of code to support this. I tried to test
well, but it’s possible I introduced some new bugs. Please let me
know if you find any issues.
I suspect this is probably one of the first full m17n compatible
implementations, so I hope it can serve as a guide to others wanting
to provide similar support in their libraries. I know I learned a ton
just figuring out how to do this. Feel free to ask me questions about
mulit-encoding support. I’ll sure try to answer them if I can.
Finally, here’s some fun news to look forward to: even with the m17n
support, CSV on Ruby 1.9 is over three times faster than FasterCSV on
Ruby 1.8 thanks to the speed of the new VM and the switch to
Oniguruma. Three cheers to the core team for giving us a much faster
Ruby!
James Edward G. II