I want to use Ruby to read a excel file’s content and convert them in to
UTF-8.
However, in that file there are many different language texts, such as
Greek, Japanese, Korea, Russia and so on.
So I use Iconv to convert the them into UTF-8.
I searched the internet, some article said the default charset of Excel
is
UTF-16LE.
So I use the codes below:
After I run it, I get a Error:
in `conv’: “)” (Iconv::InvalidCharacter)
It seems that the in UTF-16, the ( is not ‘(’???
Then I changed the ‘UTF-16’ in to ‘GB2312’(the default charset of my
system),but it cannot convert the Koean character correctly. All the
Koean
characters became ???
Datum: Mon, 15 Sep 2008 17:49:15 +0900
Von: “Wu Nan” [email protected]
An: [email protected]
Betreff: How to convert the charset of texts in a Execl which has multi-language text and charset?
(Ждите)
system),but it cannot convert the Koean character correctly. All the Koean
characters became ???
I use Ruby 1.8.6 on WinXP Sp3.
How could I resolve it ?
Many thanks,
Nan
Dear Nan,
after some searching, I found that there is a special encoding for
Korean characters, EUC-KR.
I managed to convert your Korean text from UTF-8 to EUC-KR, write it to
a file and display it correctly in Firefox, once
the right encoding is set in the Preferences (EUC-KR in this case, but I
can also display Korean text in UTF-8.)
So I think you’ll be successful by making sure you convert from EUC-KR
to UTF-8 for the Korean, and to UTF-8 for everything else.
I just test it again, I dump the original text, and display them in
integer.
I found that all the Korean Char became ‘???’ as soon as them were
read
out from the Excel.
I attached the test codes and test excel file. In the excel file there
is
only 1 text.
Datum: Mon, 15 Sep 2008 18:50:03 +0900
Von: “Wu Nan” [email protected]
An: [email protected]
Betreff: Re: How to convert the charset of texts in a Execl which has multi-language text and charset?
only 1 text.
Do you have any idea about the reason?
Dear Nan,
right now, I am not on Windows, so in order to check whether the problem
is with Windows or with
Ruby, I’d suggest you try the following (which works on Ubuntu with your
data).
now you can either iterate over all rows, skipping the first number of
rows (in case you know they just contain column headers)
skip = 0
worksheet.each(skip) { |row|
a row is actually just an Array of Cells…
first_cell = row.at(0)
p ‘first’
p first_cell
how you get data out of the cell depends on what datatype you
expect:
if you expect a String, you can pass an encoding and (iconv
required) the content of the cell will be converted.
str = row.at(0).to_s(‘EUC-KR’)
p str
f=File.open(“textexcel.html”,“w”)
f.puts str
f.close
}
I could open the file textexcel.html with correctly displayable Korean
characters (now in EUC-KR, but you can
convert these to UTF-8, at least in Ubuntu.