Hi,
I’m reading a csv file which contains utf-8 characters with jruby 1.6.7
in 1.9 mode and I’m getting ASCII-8BIT strings instead of UTF-8 ones.
Example:
#---------------------------------
CSV.foreach("#{wb_filepath}", { :encoding => “UTF-8:UTF-8”, :headers =>
true, :return_headers => true, :col_sep => ‘,’ } ) do |row|
:encoding => “UTF-8:UTF-8” should not be necessary anyhow because
Encoding.default_external = ‘UTF-8’
I just added it to test if it would help. It didn’t.
#…
test_string = row.field(‘test header’)
puts “test_string: #{test_string.encoding.name}”
==> here I get ‘test_string: ASCII-8BIT’.
#…
end
#---------------------------------
Am I doing something wrong?
Thanks,
Manfred
Manfred,
Can you also provide a data file for us to run on this? It always
help to use real data…
-Tom
On Wed, Mar 14, 2012 at 12:16 PM, Manfred U.
[email protected] wrote:
true, :return_headers => true, :col_sep => ‘,’ } ) do |row|
Thanks,
Manfred
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email
–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]
Hi Tom,
attached a small test case.
Thanks,
Manfred
Am 14.03.2012 21:47, schrieb Thomas E Enebo:
Am 14.03.2012 23:30, schrieb Don W.:
Do you have ‘# coding: UTF-8’ at the top of the script that parses
the CSV? This has bitten me in the past as well…
Yes. Moreover I’ve also set default_external to UTF-8, which should be
used when files are read. At least this is my understanding. But CSV
seems to ignore it, it treats the content as ASCII-8BIT (binary). Or I’m
just doing something wrong. Did you have a look at the small example
attached to my previous mail?
Btw., my current workaround is to call force_encoding(‘UTF-8’) for
every csv string field value.
Manfred
Do you have ‘# coding: UTF-8’ at the top of the script that parses the
CSV? This has bitten me in the past as well…
This is fixed on master, but I can confirm as broken on 1.6.7. So
1.7.0 will address this. If you really need this now you can bisect
to the changeset which fixed it on master and create a patch for 1.6
branch…or just wait for 1.7.0 (May-timeframe).
-Tom
On Thu, Mar 15, 2012 at 2:56 AM, Manfred U.
[email protected] wrote:
previous mail?
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email
–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]
Actually, getting master and trying your stuff out would be helpful in
case there is something else which may be broken (plus it would be
another confirmation master fixes this problem).
-Tom
On Thu, Mar 15, 2012 at 9:12 AM, Thomas E Enebo [email protected]
wrote:
Btw., my current workaround is to call force_encoding(‘UTF-8’) for every csv
http://xircles.codehaus.org/manage_email
–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]
–
blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]