Problems with utf8 + String class

Hi,

there is a problem i’ve been trying to solve for a couple of hours now,
and after some useless googeling and searching around, i haven’t come up
with anything substansial - i thought the forum might help me.

say i have a string and want to display only the first 10 characters or
so:

shortstring = “this is a very long string object”[0…10]

shortstring = "this is a " # which is great

but if i use the same method on a utf8 string, i get some weird
characters popping in there, sometimes yes, sometimes no. from looking
around it seems that because every character is two bytes(as apposed to
1 in regular encoding) there is sometimes a sum of odd/even characters,
and then the [0…10] doesn’t work correctly, populating wierd
characters. (same deal goes for the String#slice method)

the final result i need, in essence of this message is this:

“very long string in utf8” to become
“very lon…”

without weird characters.
any help, much appreciated.

thanks,
harp

Ruby String methods assume the string is a single byte per character,
which as you know, is not the case with unicode strings. therefore a
multibyte character in your string is going to throw everything off.
Such is the nature of Ruby.

as a starting point, i suggest you check out:

http://wiki.rubyonrails.org/rails/pages/HowToUseUnicodeStrings
http://julik.textdriven.com/svn/tools/rails_plugins/unicode_hacks

Chris