I posted this to ruby-talk, but it occurred to me that you folks
implementing Rails functionality probably have a thing or two to say
about
unicode support in Ruby. Therefore, I would love to hear your opinions.
Adding native unicode support is only a matter of time in JRuby; its
usefulness as a JVM-based language depends on it. However, we continue
to
wrestle with how best to support unicode without stepping on the Ruby
community’s toes in the process. Thoughts?
---------- Forwarded message ----------
From: Charles O Nutter [email protected]
Date: Jun 14, 2006 7:11 PM
Subject: Re: Unicode roadmap?
To: ruby-talk ML [email protected]
Every time these unicode discussions come up my head spins like a top.
You
should see it.
We JRubyists have headaches from the unicode question too. Since JRuby
is
currently 1.8-compatible, we do not have what most call native unicode
support. This is primarily because we do not wish to create an
incompatible
version of Ruby or build in support for unicode now that would conflict
with
Ruby 2.0 in the future. It is, however, embarressing to say that
although we
run on top of Java, which has arguably pretty good unicode support, we
don’t
support unicode. Perhaps you can see our conundrum.
I am no unicode expert. I know that Java uses UTF16 strings internally,
converted to/from the current platform’s encoding of choice by default.
It
also supports converting those UTF16 strings into just about every
encoding
out there, just by telling it to do so. Java supports the Unicode
specification version 3.0. So Unicode is not a problem for Java.
We would love to be able to support unicode in JRuby, but there’s always
that nagging question of what it should look like and what would mesh
well
with the Ruby community at large. With the underlying platform already
rich
with unicode support, it would not take much effort to modify JRuby. So
then
there’s a simple question:
What form would you, the Ruby users, want unicode to take? Is there a
specific library that you feel encompasses a reasonable implementation
of
unicode support, e.g. icu4r? Should the support be transparent, e.g. no
longer treat or assume strings are byte vectors? JRuby, because we use
Java’s String, is already using UTF16 strings exclusively…however
there’s
no way to get at them through core Ruby APIs. What would be the most
comfortable way to support unicode now, considering where Ruby may go in
the
future?
–
Charles Oliver N. @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com