Hi!
I have problem with legacy database in MSSQL encoding.
The new app in Rails and most of the database is Utf8, while
tables from legacy MSSQL are encoded in ISO-8859-1.
I am not sure how to approach this problem. I would imagine
creation of some kind of filter with Iconv.conv(“UTF-8”, “ISO-8859-1”,
input)
for reading sql statements and the reverse for write methods.
def iso1_to_utf8
Iconv.conv(“UTF-8”, “ISO-8859-1”, self) rescue self
end
def utf8_to_iso1
Iconv.conv(“ISO-8859-1”, “UTF-8”, self) rescue self
end
end
class NCustomer < ActiveRecord::Base
…
def after_initialize
self.attributes.each do |k,v|
if v.class.to_s == “String” and !v.empty?
self[k] = v.iso1_to_utf8
end
end
end
def before_save
self.attributes.each do |k,v|
if v.class.to_s == “String” and !v.empty?
self[k] = v.utf8_to_iso1
end
end
end
…
end
I have problem with legacy database in MSSQL encoding.
The new app in Rails and most of the database is Utf8, while
tables from legacy MSSQL are encoded in ISO-8859-1.
I am not sure how to approach this problem. I would imagine
creation of some kind of filter with Iconv.conv(“UTF-8”, “ISO-8859-1”,
input)
for reading sql statements and the reverse for write methods.
Any suggestions how this could be done?
I think that everyone who’s ever been cornered by this can attest that
it can be a royal pain and that the options are all choosing between
lesser evils.
Last time I found myself there I exported the database to a plain text
SQL file and then ran recode [1] on it. It worked well enough but I
still found myself scanning over the file to correct tiny bits here and
there.
I’ll admit that your solution has a ring to it – being able to do it on
the fly improves the time to deploy. I like it. If however speed or
strict migration paths are an issue for you then recode may be an option
to consider.
i’m having a quite similar issue atm and i’m unsure what to do.
same thing, Legacy MSSQL Database with windows-1252 charset (so iso
compatible) on the legacy tables, and the Rails app purely using UTF-8.
i had no problems with encodings while using the sqlserver driver in ado
mode
and setting:
WIN32OLE.codepage = WIN32OLE::CP_UTF8
but unfortunately the ado driver is leaking memory so badly that the
mongrels need to be restarted every 2 hours.
when i switch to odbc mode, the memory leak is gone but with ODBC the
codepage setting doesn’t work anymore, so i get messed up texts from
those old tables.
I had problems with blob fields beeing treated as Strings as well. I
made a little change, so it is posible to exclude some fields from the
conversion.
use in model like this:
recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”, :exclude
=> [“myblob1”,“myblob2”]