I was searching for string encoding issues in Ruby. Here is the summary
of what I learnt, in case its useful to anyone else of if anyone has any
corrections to this.
Ruby 1.8 support for encoding:
A comment like "# -*- coding: utf-8 -*-" at the start of the
file is supposed to determine how to parse a .rb file, but I haven’t
really figured out how to make this work. Non-ansi characters cause an
error while loading the file.
ruby.exe -K<kcode> sets $KCODE (which can also be set
programmaticaly)
$KCODE affects the following:
Determines the encoding to use to parse .rb files. Normally,
identifiers have to be ANSI, but the limitation is removed if $KCODE is
set to “UTF8”.
Affects whether inspect escapes non-ascii chars, or if it
leaves them as is.
Affects how regexps without an explicit encoding interpret the
input string.
Ruby 1.9 support for encodings:
Identifiers can be non-ANSI by default.
Ruby 2.0 support for encodings:
Each string and symbol knows its own encoding, and
String#force_encoding can change the encoding of an existing string.
IO#encoding to control encoding to use for reading/writing
· A comment like “# -- coding: utf-8 --” at the start of the
file is supposed to determine how to parse a .rb file, but I haven’t really
figured out how to make this work. Non-ansi characters cause an error while
loading the file.
Did the utf-8 file(s) you tried have a BOM or not?
If I use Notepad2’s menu to set the encoding to “UTF8 with signature”,
and run either “ruby utf8_with_signature.rb” or “ruby -Ku
utf8_with_signature.rb”, the file fails to parse. The file is attached.
If I save the file with encoding set just as “UTF8”, the file is 3 bytes
smaller. “ruby utf8.rb” fails, but “ruby -Ku utf8.rb” works. With “-Ku”,
things work even if I do not have “# -- coding: utf-8 --” in the file.
A comment like "# -*- coding: utf-8 -*-" at the start of the
file is supposed to determine how to parse a .rb file, but I haven’t
really figured out how to make this work. Non-ansi characters cause an
error while loading the file.
Did the utf-8 file(s) you tried have a BOM or not?
AFAIK Ruby 1.8 doesn’t support magic comments that specify encodings at
all, 1.9 does. Ruby 1.8 also doesn’t recognize BOM.
Even version 1.9 has full encoding support, not just 2.0.
I was searching for string encoding issues in Ruby. Here is the summary
of what I learnt, in case its useful to anyone else of if anyone has any
corrections to this.
Ruby 1.8 support for encoding:
A comment like "# -*- coding: utf-8 -*-" at the start of the
file is supposed to determine how to parse a .rb file, but I haven’t
really figured out how to make this work. Non-ansi characters cause an
error while loading the file.
ruby.exe -K<kcode> sets $KCODE (which can also be set
programmaticaly)
$KCODE affects the following:
Determines the encoding to use to parse .rb files. Normally,
identifiers have to be ANSI, but the limitation is removed if $KCODE is
set to “UTF8”.
Affects whether inspect escapes non-ascii chars, or if it
leaves them as is.
Affects how regexps without an explicit encoding interpret the
input string.
Ruby 1.9 support for encodings:
Identifiers can be non-ANSI by default.
Ruby 2.0 support for encodings:
Each string and symbol knows its own encoding, and
String#force_encoding can change the encoding of an existing string.
IO#encoding to control encoding to use for reading/writing
from disk
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.