CSV Goes M17n

bbazzarrakk · September 21, 2008, 7:15pm

I’ve just finished an extensive reworking of the standard CSV library
in Ruby 1.9 (formerly FasterCSV). CSV’s parser and generator are now
m17n aware. This means they should work naturally with your data in
any non-“dummy” Encoding Ruby 1.9 supports.

Everything is documented so it should be pretty easy to figure out how
to use the new system, but generally you just set the Encoding for
your IO or String objects correctly and CSV should do the rest:

reading example

CSV.foreach(…, :encoding => “…”) do |row|
# row will be parsed but not transcoded here
end

writing example

CSV.open(…, “wb:…”) do |csv|
csv << data
# data will be quoted and separated with characters
# in the proper encoding
end

Encodings default to Encoding.default_external if not provided.

I had to change quite a bit of code to support this. I tried to test
well, but it’s possible I introduced some new bugs. Please let me
know if you find any issues.

I suspect this is probably one of the first full m17n compatible
implementations, so I hope it can serve as a guide to others wanting
to provide similar support in their libraries. I know I learned a ton
just figuring out how to do this. Feel free to ask me questions about
mulit-encoding support. I’ll sure try to answer them if I can.

Finally, here’s some fun news to look forward to: even with the m17n
support, CSV on Ruby 1.9 is over three times faster than FasterCSV on
Ruby 1.8 thanks to the speed of the new VM and the switch to
Oniguruma. Three cheers to the core team for giving us a much faster
Ruby!

James Edward G. II

bbazzarrakk · September 22, 2008, 5:58pm

On Mon, Sep 22, 2008 at 02:06:42AM +0900, James G. wrote:

CSV.foreach(?, :encoding => “?”) do |row|
Encodings default to Encoding.default_external if not provided.

Finally, here’s some fun news to look forward to: even with the m17n
support, CSV on Ruby 1.9 is over three times faster than FasterCSV on Ruby
1.8 thanks to the speed of the new VM and the switch to Oniguruma. Three
cheers to the core team for giving us a much faster Ruby!

Awesome James!

FasterCSV is under very heavy utilization over here and we’re always
glad you
made such a fine library.

enjoy,

-jeremy