In my opinion, Ruby is practically useless for many applications without
proper Unicode support. How a modern language can ignore this issue is
really beyond me.
Is there a plan to get Unicode support into the language anytime soon?
In my opinion, Ruby is practically useless for many applications without
proper Unicode support. How a modern language can ignore this issue is
really beyond me.
Is there a plan to get Unicode support into the language anytime soon?
Hi,
In message âRe: Unicode roadmap?â
on Wed, 14 Jun 2006 06:13:03 +0900, Roman H.
[email protected] writes:
|In my opinion, Ruby is practically useless for many applications without
|proper Unicode support. How a modern language can ignore this issue is
|really beyond me.
Define âproper Unicode supportâ first.
|Is there a plan to get Unicode support into the language anytime soon?
Iâm planning enhancing Unicode support in 1.9 in a year or so
(finally). But Iâm not sure that conforms your definition of âproper
Unicode supportâ. Note that 1.8 handles Unicode (UTF-8) if your
string operations are based on Regexp.
matz.
On Jun 13, 2006, at 6:34 PM, Pete wrote:
Define âproper Unicode supportâ first.
having an unicode-equivalent for all methods of class String
like size, slice, upcase
E.g. I tried the unicode plugin⌠but, alas, who wantâs to write
stuff like ânormalize_KCâ etc. if you just want the frickinâ
substring of a string?!
def substring(str, start, len)
md = str.match(/\A.{#{start}}(.{#{len}})/)
md[1]
end
def strlength(str)
n = 0
str.gsub(/./m) { n += 1; $& }
n
end
See! Regexps do everything!
Just you know, set $KCODE and use these methods and you are set!
(I am kidding⌠btw)
Define âproper Unicode supportâ first.
having an unicode-equivalent for all methods of class String
like size, slice, upcase
E.g. I tried the unicode plugin⌠but, alas, who wantâs to write stuff
like ânormalize_KCâ etc. if you just want the frickinâ substring of a
string?!
you need to read books on unicode just to properly use the pluginâŚ
aargg :-((
Best regards
Peter
Yukihiro M. schrieb:
From the theoretical point of view this is quite interesting. Also I
understand the humor
Performance and memory consumption should be breathtaking using regexp
just everywhereâŚ
Also there are a few methods left
As I am German the âmissingâ unicode support is one of the greatest
obstacles for me (and probably all other Germans doing their stuff
seriously)âŚ
Logan C. schrieb:
From: Pete [mailto:[email protected]]
Sent: Wednesday, June 14, 2006 1:58 AM
As I am German the âmissingâ unicode support is one of the greatest
obstacles for me (and probably all other Germans doing their stuff
seriously)âŚ
The same is for Russians/Ukrainians. In our programming communities
question
âdoes the programming language supports Unicode as ânativeâ?â has very
high
priority.
/BTW, here is one of the things where Python beats Ruby completely
V.
I suspect the Japanese posters on this list can answer better than I
can,
but my impression is that Unicode is, shall we say, not highly thought
of
outside Europe and North America. The way they dealt with âChineseâ
characters was apparently more than a bit of a hack, and just doesnât
work
very well in the real world. Reading some of the explanations for
glyphs
versus characters in Unicode just makes you shake your head. What were
they
thinking? Sure doesnât pass the smell test, although Iâll be the first
to
admit I havenât exactly thought deeply about the subject.
Thereâs another problem with Japanese - Iâve got a friend whoâs been
dealing
with some issues around the fact that Japanese apparently innovates new
characters on a regular basis, and everyone is expected to use the new
characters. (I believe this is called gaiji). The concept of a fixed
character set apparently just isnât a good idea to start with.
[Awaiting corrections from people who actually know something about this
topic :-)âŚ]
On Jun 13, 2006, at 7:56 PM, James M. wrote:
topic :-)âŚ]
I have one Japanese person here whoâs never heard of this gaiji
concept. But it could be new and behind a generation gap of some
kind. They do sure like to add symbols where they can, though.
Especially graphical star characters. I see that a lot.
-Mat
On 6/14/06, James M. [email protected] wrote:
with some issues around the fact that Japanese apparently innovates new
characters on a regular basis, and everyone is expected to use the new
characters. (I believe this is called gaiji). The concept of a fixed
character set apparently just isnât a good idea to start with.[Awaiting corrections from people who actually know something about this
topic :-)âŚ]
There is a good summary of the han unification controversy on wikipedia;
http://en.wikipedia.org/wiki/Han_unification
Hi,
In message âRe: Unicode roadmap?â
on Wed, 14 Jun 2006 08:11:49 +0900, âVictor S.â
[email protected] writes:
|From: Pete [mailto:[email protected]]
|Sent: Wednesday, June 14, 2006 1:58 AM
|> As I am German the âmissingâ unicode support is one of the greatest
|> obstacles for me (and probably all other Germans doing their stuff
|> seriously)âŚ
|
|The same is for Russians/Ukrainians. In our programming communities question
|âdoes the programming language supports Unicode as ânativeâ?â has very high
|priority.
Alright, then what specific features are you (both) missing? I donât
think it is a method to get number of characters in a string. It
canât be THAT crucial. I do want to cover âyour missing featuresâ in
the future M17N support in Ruby.
matz.
From: Yukihiro M. [mailto:[email protected]]
Sent: Wednesday, June 14, 2006 5:37 AM
|The same is for Russians/Ukrainians. In our programming communities
matz.
I suppose, all we (non-English-writers) need is to have all
string-related
methods working. Just for now, I think about plain testing each string
method; also, some other classes can be affected by Unicode (possibly
regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes
are
not: File.open with Russian letters in path donât finds the file.
More generally, it can make sense to have Unicode as the âbaseâ mode;
where
non-Unicode to stay âold, compatibilityâ mode.
Something like this.
V.
Hi,
In message âRe: Unicode roadmap?â
on Wed, 14 Jun 2006 14:26:30 +0900, âVictor S.â
[email protected] writes:
|I suppose, all we (non-English-writers) need is to have all string-related
|methods working. Just for now, I think about plain testing each string
|method;
In that sense, I am one of the non-English-writers, so that I can
suppose I know what we need. And I have no problem with the current
UTF-8 support. Maybe thatâs because Japanese donât have cases in our
characters. Or maybe Iâm missing something. Can you show us your
concrete problems caused by Rubyâs lack of âproperâ Unicode support?
|also, some other classes can be affected by Unicode (possibly
|regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes are
|not: File.open with Russian letters in path donât finds the file.
Strange. Ruby does not convert encoding, so that there should be no
problem opening files, if you are using strings in the encoding your OS
expect. If they are differ, you have to specify (and convert) them
properly, no matter how Unicode support is.
matz.
Roman H. wrote:
In my opinion, Ruby is practically useless for many applications without
proper Unicode support. How a modern language can ignore this issue is
really beyond me.Is there a plan to get Unicode support into the language anytime soon?
I also think that this is very important.
On Jun 14, 2006, at 15:56 , Victor S. wrote:
As mentioned in this topic, itâs String#length, upcase, downcase,
capitalize.
Just to chime in, arenât upcase, downcase, and capitalize a locale/
localization issue rather than a Unicode-only issue per se? For
example, different languages will have different rules for
capitalization. Or am I wrong? Does Unicode in and of itself address
these issues?
Granted, proper support for upcase, downcase, and capitalize is
important, but I think itâs a separate issue, part of m17n as a whole
rather than support for Unicode in particular.
Michael G.
grzm seespotcode net
From: Yukihiro M. [mailto:[email protected]]
Sent: Wednesday, June 14, 2006 9:35 AM
In that sense, I am one of the non-English-writers,
Sorry, Matz, I know, of course. But I know too less about Japanese to
see
how close our tasks are. Under ânon-English-writersâ I, maybe, had to
say
âEuropean languagesâ or so - which has common punctuations, LTR writing,
âwordsâ and âwhitespacesâ and so on. I have almost no knowledge about
Japanese, Korean, Arabic, Hebrew people needs.
so that I can
suppose I know what we need. And I have no problem with the current
UTF-8 support. Maybe thatâs because Japanese donât have cases in our
characters. Or maybe Iâm missing something.
Just what Iâve said above.
Can you show us your
concrete problems caused by Rubyâs lack of âproperâ Unicode support?
As mentioned in this topic, itâs String#length, upcase, downcase,
capitalize.
BTW, does String#length works good for you?
Moreover, there seems to be some huge problems with pathes having
Russian
letters; but Iâm really not convinced, if Ruby really has to handle
this.
|also, some other classes can be affected by Unicode (possibly
|regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes
are
|not: File.open with Russian letters in path donât finds the file.Strange. Ruby does not convert encoding, so that there should be no
problem opening files, if you are using strings in the encoding your OS
expect. If they are differ, you have to specify (and convert) them
properly, no matter how Unicode support is.
Oh, itâs a bit hard theme for me. I know Windows XP must support Unicode
file names; I see my filenames in Russian, but I have low knowledge of
system internals to say, are they really Unicode?
If not take in account those problems, the only String problems remains,
but
they are so base core methods!
V.
Hi,
As mentioned in this topic, itâs String#length, upcase, downcase,
capitalize.BTW, does String#length works good for you?
To have the length of a Unicode string, just do str.split(//).length,
or ârequire âjcodeââ at the beginning of your code.
For the other functions, try looking at the unicode library
http://www.yoshidam.net/Ruby.html#unicode
Oh, itâs a bit hard theme for me. I know Windows XP must support Unicode
file names; I see my filenames in Russian, but I have low knowledge of
system internals to say, are they really Unicode?
Windows XP does support Unicode file names, but Iâm not sure you can
use them with Ruby (I do not use Ruby much under Windows). Try
converting the file names to your current locale, it should work if
the file names can be converted to it. What I mean is that Russian
file names encoded in the Windows Russian encoding should work on a
Russian PC.
Hope this helps,
Cheers,
Vincent ISAMBART
Hi,
In message âRe: Unicode roadmap?â
on Wed, 14 Jun 2006 15:56:02 +0900, âVictor S.â
[email protected] writes:
|> Can you show us your
|> concrete problems caused by Rubyâs lack of âproperâ Unicode support?
|
|As mentioned in this topic, itâs String#length, upcase, downcase,
|capitalize.
OK. Case is the problem. I understand.
|BTW, does String#length works good for you?
I donât remember the last time I needed length method to count
character numbers. Actually I donât count string length at all both
in bytes and characters in my string processing. Maybe this is a
special case. I am too optimized for Ruby string operations using
Regexp.
|Oh, itâs a bit hard theme for me. I know Windows XP must support Unicode
|file names; I see my filenames in Russian, but I have low knowledge of
|system internals to say, are they really Unicode?
Windows 32 path encoding is a nightmare. Our Win32 maintainers often
troubled by unexpected OS behavior. I am sure we can handle Russian
path names, but we need help from Russian people to improve.
matz.
From: Michael G. [mailto:[email protected]]
Sent: Wednesday, June 14, 2006 10:08 AM
On Jun 14, 2006, at 15:56 , Victor S. wrote:
As mentioned in this topic, itâs String#length, upcase, downcase,
capitalize.Just to chime in, arenât upcase, downcase, and capitalize a locale/
localization issue rather than a Unicode-only issue per se? For
example, different languages will have different rules for
capitalization.
Really? I know about two cases: European capitalization and no
capitalization.
But, really, you maybe right. I suppose, Florian G. can say something
about German-specific capitalization issues.
Granted, proper support for upcase, downcase, and capitalize is
important, but I think itâs a separate issue, part of m17n as a whole
rather than support for Unicode in particular.
Maybe. Generally, sometimes I want Unicode, and sometimes (for âquick
dirtyâ
scripts) Iâll prefer capitalization and regexps âjust workâ with
Windows-1251 (one-byte Russian encoding).
V.
From: Vincent I. [mailto:[email protected]]
Sent: Wednesday, June 14, 2006 10:14 AM
As mentioned in this topic, itâs String#length, upcase, downcase,
capitalize.BTW, does String#length works good for you?
To have the length of a Unicode string, just do str.split(//).length,
or ârequire âjcodeââ at the beginning of your code.
For the other functions, try looking at the unicode library
http://www.yoshidam.net/Ruby.html#unicode
I know about it. But, theoretically speaking, such a âcoreâ methods muts
be
in core. Not?
properly, no matter how Unicode support is.
Russian PC.
Yes, they works. But I canât solve the problem: need Ruby Unicode
support
include filenames operations?
V.
Yukihiro M. skrev:
Hi,
In message âRe: Unicode roadmap?â
on Wed, 14 Jun 2006 06:13:03 +0900, Roman H. [email protected] writes:
|In my opinion, Ruby is practically useless for many applications without
|proper Unicode support. How a modern language can ignore this issue is
|really beyond me.Define âproper Unicode supportâ first.
I wonât define âproper Unicode supportâ here.
But there must be a problem somewhere since pure-ruby Ferret doesnât
support UTF-8. You need to use the c-extension of Ferret to have it
support UTF-8 (which doesnât work on Windows yet ). I donât know if
that is just a sucky impl of Ferret or if itâs Ruby that make it so.
Maybe Dave Balmain can enlighten us why UTF-8 doesnât work in the pure
Ruby version and what is needed of Ruby to make it work (if itâs
actually Rubyâs fault that is)?
My personal belief is that it should just work in a case like this if
data in is UTF-8 and search strings is UTF-8 without the lib author
and/or user having to do anything very special to make it work (apart
from specifying encoding). Am I wrong in this?
Regards,
Marcus
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs