I’m using Rails 2.3.8 with Ruby 1.9.1 and I’m having a problem with
serialized attributes in active record not preserving string encodings.
The underlying problem is probably yaml, but I’m wondering if anyone has
any good ideas on how to handle this. The app I’m working on has
numerous serialized fields some of which contain deep structures of
arrays and hashes. Getting back an ASCII-8Bit string (that’s actually
UTF-8) deep within those structures wrecks havoc later…
Perhaps best illustrated by example, if I save l to a serialized attr in
an active record model I’ll get back l2 on reading from the database.
l
=> [“English”, “Türkçe”, “РуÑÑкий”]l.map(&:encoding)
=> [#Encoding:UTF-8, #Encoding:UTF-8, #Encoding:UTF-8]l.map(&:valid_encoding?)
=> [true, true, true]l.to_yaml
=> “— \n- English\n- !binary |\n VMO8cmvDp2U=\n\n-
“\xD0\xA0\xD1\x83\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8\xD0\xB9”\n”l2 = YAML.load(l.to_yaml)
=> [“English”, “T\xC3\xBCrk\xC3\xA7e”, “РуÑÑкий”]l2.map(&:encoding)
=> [#Encoding:UTF-8, #Encoding:ASCII-8BIT, #Encoding:UTF-8]
Does anyone know how yaml decides on whether or not to store a string as
binary vs. as an escaped string? Both the last two strings above are
non-ascii-7 but only the first is stored as binary…