I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.
Translating the word “where” returns “dónde” instead of “dónde”
Any idea why this is happening and what I can do to fix this?
I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.
Translating the word “where” returns “dónde” instead of “dónde”
Any idea why this is happening and what I can do to fix this?
You need to specify encoding in your ruby script. Ruby (1.8 at least, I
am
not certain of 1.9)
will use your system encoding for strings by default.
Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE = ‘u’
Note that Ruby could be processing google translate correctly (i.e. you
are
doing everything above),
but if you are outputting the result to the console/system out (via
puts)
your machine may still
process the UTF-8 text according to the host system. This for instance
is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are
seeing
this problem.
Note that Ruby could be processing google translate correctly (i.e. you
are
doing everything above),
but if you are outputting the result to the console/system out (via
puts)
your machine may still
process the UTF-8 text according to the host system. This for instance
is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are
seeing
this problem.
I had the $KCODE variable set. It didn’t seem to do anything in this
case. I outputted the translated text to a file to see if it was a
display issue with the console and the text was still incorrect in the
file.
I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.
Translating the word “where” returns “dónde” instead of “dónde”
The amusing part is that the first one looks fine to me.
I suspect this means that you’re getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings…
I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.
Translating the word “where” returns “dónde” instead of “d�nde”
The amusing part is that the first one looks fine to me.
I suspect this means that you’re getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings…
Where would I specify the encoding to fix this problem? And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. I’m guessing somewhere (other than $KCODE) I need to set it as
UTF-8.
Where would I specify the encoding to fix this problem? Â And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. Â I’m guessing somewhere (other than $KCODE) I need to set it as
UTF-8.
Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding(“UTF-8”) or
string.force_encoding(“ISO-8859-1”)… as needed. In ruby 1.8, I
think it just works with the bits you provide it and it’s your
terminal that determines what actually gets displayed.
I’m using 1.8.7. Â I don’t think it’s the terminal but I’m not entirely
sure. Â I’m outputting the translation to a text file, but technically
I’m viewing it in a terminal app (Putty) so it may be screwing up there.
Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding(“UTF-8”) or
string.force_encoding(“ISO-8859-1”)… as needed. In ruby 1.8, I
think it just works with the bits you provide it and it’s your
terminal that determines what actually gets displayed.
I’m using 1.8.7. I don’t think it’s the terminal but I’m not entirely
sure. I’m outputting the translation to a text file, but technically
I’m viewing it in a terminal app (Putty) so it may be screwing up there.
Aha! Well that fixed the problem of being able to see the correct
output in the terminal. It should greatly help the debugging process
now. I’m then taking the encoded string and transferring it with XML via
a socket connection. I’ll have to look into the transfer to see if it’s
breaking there.