Capital Cyrillic letter in Ruby class name (UTF-8)

Hi matz,
I am not happy to use Ruby.

All I want is to generate class by name from XML-structure, this is
correct class name, starts from capital Cyrillic letter, but it is
impossible.
Error:
class/module name must be CONSTANT

So I have suitable binding of our XML-structure on Python, I would like
to have same on Ruby. Is it so hard to allow people from another country
use they language and capital letters?

I don’t want to create class RДокумент! (first ASCII, next UTF-8
Cyrillic)
I need a simple way to make classes like this: class Документ.
Classes with latin beginning with capital latin letter make developer
switch keyboard language with endless Ctrl+Shift.
This make developers completely unhappy and the way they go from Ruby
binding to Python.

We make buiseness logic written basically on C++ binding to Ruby/Python.
All we want is to load XML-structure of classes, methods and properties
from file to generate classes on Ruby/Python.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 17.04.2012 17:05, schrieb Vladimir K.:

I need a simple way to make classes like this: class Документ.

It’s perfectly possible.

irb(main):001:0> Документ = Class.new
=> #Class:0x00000001b5e0e0
irb(main):002:0> obj = Документ.new
=> #<#Class:0x00000001b5e0e0:0x00000001b57e48>

You may want to overwrite the class’ ::inspect method to display
properly in IRB. But note that it is regarded as a local variable
rather than a constant; you could use a namespace module with module
methods (which can be called using the :: syntax) to achieve what you
want, i.e. something like

=========================
module Namespace

X = Class.new
def X.inspect; “Документ”; end
def X.name; “Документ”; end

def self.Документ
X
end

end

obj = Namespace::Документ.new

Note that I assign the new class to a separate constant (X) instead of
just returning Class.new in the Namespace::Документ() method, because
this would return a new class each time you call it (making checks
with #kind_of? useless).
However, using local names doesn’t seem a good idea to me altogether,
most programming languages are inspired by English and mixing it with
another language makes it look weird. Even worse, a framework like
Rails expects English nouns to be used as classnames, because it
automatically derives a whole bunch of further names from it by
inflecting the word according to English grammar rules. Local names
break such frameworks completely.

This obviously requires Ruby 1.9.

Vale,
Marvin

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPjY1TAAoJELh1XLHFkqhaQNgH/2P53eA0VeZEwzugtQGr4qHz
9Y2mcOYzVDkpOReCwMbN28FJ/jqrhDwjjPfsLqQKUi2ApxCCeRMWyhII96AA+Y7g
VXw3uipwTol+1EzWVoCXkBDHrLuKxqEDcSD28FGoL95EnO2uF9w2YnVOG8ctFtpA
QO8hxxCGsng2UxuPRCUVKbCXroz07AnEx6MD/bKcg27eVvycm11yP5yxnJQdFjay
ZvRfHAiCXBb63ImOu613+/ZG8Itx7kXr/1lnjZOnHpt9dKGClmN35crrON/wDDfD
1do7W3OlNNTJ7ZbzJ5CZ2jnAX7lKzAduBF77608Nup2bzhFoVyQ8C+dqX2rJdcg=
=Lx2r
-----END PGP SIGNATURE-----

Here Документ is only a local variable.
Also method definition looks like it was come from hell.
I need to use UTF-8 names for buiseness logic of all other companies who
does not care about English. All they want is suitable logic on Russian.
Simply call Документ.Прочитать(ИдО=123).
Or call Документ.Записать().
It is on Python, it could be on Ruby but here is very strange limitation
of first character must be ASCII capital letter! ASCII! Why you allow
UTF-8 encoding and forbid to make classes like this:

class Пользователь
def Найти(…)

end
end

From similar ordered XML-structure.
It is strange and unreasonable check.

By the way, we don’t need ActiveRecord or Rails at all.

W dniu 17 kwietnia 2012 17:47 użytkownik Vladimir K.
[email protected] napisał:

class Пользователь
def Найти(…)

end
end

Vladimir, unfortunately Unicode and internationalisation in general is
hard. In C code (that’s what Ruby is written in), checking whether a
letter is an ASCII capital one is dead simple. Checking whether is it
an Unicode capital requires decoding it from whichever variant of UTF
is was encoded in and looking it up in one oh huge tables defining
various letter properties to check if it is a capital or not. You
would probably have to change parsing rules and a lot of internal code
to make it possible, since capital first letter is the only difference
between Ruby constant or variable.

I think you should try posting to Ruby’s bug and feature tracker, at
http://bugs.ruby-lang.org/.

– Matma R.

So, You talk about if I want normal name for my classes then I need to
change Ruby source code, recompile it and be happy?! :slight_smile:
No, that is wrong way.
All I need is box solution: install Ruby and use it (and be happy).
Also for all other companies what will use this binding not allowed to
download Ruby installer wherever except official Ruby site, where is no
patch for capital Cyrillic symbols.
No way. I need official patch or wait to fix it in next release, for now
I can’t use Ruby as high-level binding. So we still only on Python.

Vladimir K. писал 17.04.2012 19:47:

of first character must be ASCII capital letter! ASCII! Why you allow

By the way, we don’t need ActiveRecord or Rails at all.

If you want to use 1C, go and use 1C. While internationalized names per
se
are a very opinionated topic, in the case of Ruby they are definitely
and
explicitly bad because all other parts of ecosystem are already in
English
and this will not change (not to mention that even if such a patch gets
accepted, and I hope it will not be, keywords still will be in English
for
foreseeable future.)

Vladimir K. wrote in post #1057521:

No way. I need official patch or wait to fix it in next release, for now
I can’t use Ruby as high-level binding. So we still only on Python.

Really, I don’t get what the fuzz is all about. If one idea doesn’t
work, well, then try something else. You may not get the perfect
solution, but you’ll certainly manage to circumvent this problem
(especially in Ruby!). Marvins solution, for example, seems perfectly
acceptable to me. I see no problem with the variables, because there’s
no real danger of overwriting them (they cannot be accessed from outside
the class).

But whining and complaining about a missing feature seems rather stupid
to me. Brian already told you that this was a conscious decision, so it
probably won’t change in the near future.

Bartosz Dziewoński wrote in post #1057002:

Vladimir, unfortunately Unicode and internationalisation in general is
hard. In C code (that’s what Ruby is written in), checking whether a
letter is an ASCII capital one is dead simple. Checking whether is it
an Unicode capital requires decoding it from whichever variant of UTF
is was encoded in and looking it up in one oh huge tables defining
various letter properties to check if it is a capital or not. You
would probably have to change parsing rules and a lot of internal code
to make it possible, since capital first letter is the only difference
between Ruby constant or variable.

Please note that the above is untrue.

ruby 1.9 already does have the functions built-in to determine whether a
letter is Unicode capital or not; however it explicitly only accepts
ASCII upper-case for the start of constants. This was a conscious design
decision. See http://redmine.ruby-lang.org/issues/show/1853
and the other threads linked from that Redmine issue.

If you wanted to allow Unicode capitals as the start of constants, it
would be a very simple patch to the C source.

What I’ve managed to gather about upper/lower case handing in ruby 1.9
is documented at

(section 11)

Regards,

Brian.

P.S. I’m not saying I think this is a good design decision - in fact I
think the whole encoding aspect of ruby 1.9 is a dog’s breakfast - but
that’s purely my opinion.

lol

I’m looking forward to endless discussions whenever you think a certain
(questionable) feature is missing:

“Hi Matz, we urgently need multiple inheritance. Please implement it as
soon as possible!”

“Hi Matz, please change the keywords to Russian!”

It could be funny, but it is make us use method at the place where we
need to use class. Thanks for you does not forbid method name starts
from Cyrillic letter. This make Ruby illogical and move Ruby-binding
behind current Python-binding, that is logical at all.

Vladimir K. wrote in post #1057566:

It could be funny, but it is make us use method at the place where we
need to use class. Thanks for you does not forbid method name starts
from Cyrillic letter. This make Ruby illogical and move Ruby-binding
behind current Python-binding, that is logical at all.

Matz is our benevolent dictator, Guido is Python’s. Their decisions win.

You need to take each language as a whole. If on balance you prefer
Ruby, then use Ruby; if something else has the best combination of
features for you (of the fewest annoyances), then use that. Whichever
one gets the job done best for you.

The decision that constants must start with ASCII A-Z is fundamental and
unlikely to be revisited.

All I want is normal name for classes autogenerated from XML-structure.
Check of ASCII capital letter is first letter of class must be removed
or fixed at least for Russian and Greek section in UTF-8.

class Документ
def Провести(…)

end
end

is absolutely correct with condition of first letter must be capital.
So this check is unreasonable and incorrect.
I don’t want to call my class RДокумент or use some hack.

Please allow us to use normal class names, not RКлиент.
It is already work in Python binding, so all I need to make same in
Ruby.

P.S. It is not “1C” and not related to it.

Brian C. wrote in post #1057729:

Vladimir K. wrote in post #1057566:

You need to take each language as a whole. If on balance you prefer
Ruby, then use Ruby; if something else has the best combination of
features for you (of the fewest annoyances), then use that. Whichever
one gets the job done best for you.

The decision that constants must start with ASCII A-Z is fundamental and
unlikely to be revisited.

All I need is suitable binding for company uses Ruby.
Condition for classes starts from ASCII A-Z for language based on UTF-8
is slightly strange, don’t you think so?
What about Japanese letters, is it in ASCII? Or may be Greek alphabet?

Cyrillic and Greek letters both have a legal capital letters in UTF-8.
This logic not differ from capital letter of Latin capital.

So language based on UTF-8 with condition “classes and modules must
start from capital letter” must handle at least Greek and Cyrillic
letters too.

Actually it is. If you want to stay in the logic way.

Vladimir K. писал 28.04.2012 17:02:

and
UTF-8.
This logic not differ from capital letter of Latin capital.

So language based on UTF-8 with condition “classes and modules must
start from capital letter” must handle at least Greek and Cyrillic
letters too.

Actually it is. If you want to stay in the logic way.

Okay, you really want this to be done and you don’t want to use a
patched version
of Ruby. (Besides that, even if you’ll manage to get this change to the
core,
which is not very likely due to technical reasons, you’ll have to wait
a
point release–a year or something like that.)

As a Russian developer, I think the idea is stupid, but I’m a bit
curious because
it is, on the other hand, somewhat hard. You can achieve what you want
by using
gem `polyglot’ and using some other suffix for your
“Russian-enhancened” files
(like .rbr maybe), or overriding the Kernel#load and checking for a
magic comment
(like # encoding:utf-8; russian-identifiers:true). Then you’ll have to
rewrite all
constant accesses with Russian to something with a prefix
(Документ->RДокумент) and
load the modified file. This modification can be done with a regexp,
through I’d
recommend using Ripper.

This is as far as you can go without modifying the interpreter. It’s
easy to patch
the constant name verification code, but, as I’ve already said, the
change won’t
appear in mainstream redistrubutables any fast.

So your solution looks like a cry about language is incomplete.
I prefer use global functions with suitable names than create patches or
install additional gems on each server of cloud.
We need complete box version from official site without any unstable
gem.
So all we need is simply standard Ruby interpreter 1.9+ and use it and
be happy.
Now it is partially possible, we use strange methods like this:

def Документ
RДокумент
end

Looks strange, but work without any patch/gem/whatever.
Here RДокумент is class name, as you can see we have illogical part of
code.
Very bad. But we have no choice.

Thank Matz for so amazing limitation of CONSTANT in UTF-8, cuts off
absolutely all UTF-8 class names over ASCII.

P.S. By the way, irb works absolutely bad on windows console with
cyrillic names. Take a look on same python interactive mode, it is work
perfectly on windows console with Russian identifiers and show they
correctly (irb show ??? symbols). Both irb and python use interpreter
based on UTF-8.

On May 1, 2012, at 2:44, Vladimir K. [email protected] wrote:

Thank Matz for so amazing limitation of CONSTANT in UTF-8, cuts off
absolutely all UTF-8 class names over ASCII.

Patches welcome. Put up or shut up.

Actually he has one point.

If the internationalization was really needed then the first character
should be not limited to an ASCII-check alone.

But then again, I dont use UTF myself so I dont care about class
INSERT_FUNNY_RUSSIAN_CHARACTERS_HERE.

And, I have to say, Vladimir sounds a lot like a troll. Like Ilias.

I mean really, to use XML and then whine about limitations of ruby?

I doubt he even uses python lol.

Actually I am a C++ developer, but sometimes uses Python.
I don’t cry, “give me something like: a = [x*x for x in values]”
All I want is to know “WHY?!” and ask when Matz planning to fix it.
Finally:

  1. Why you limit FIRST character of class by A-Z for UTF-8 oriented
    language?
  2. When you planning to fix it for Cyrillic and Greek class names?

Answers like “patch yourself” are bad, I am not pure Ruby developer, and
my patch could be of bad quality. For Ruby I just create suitable
binding for company uses Ruby and our platform written on C++. All I
need is to generate set of classes with Cyrillic names. Currently I use
methods which hides classes with inproper names, just because method
names aren’t limited.

Maybe Ruby identifiers should be more like Java’s:

http://stackoverflow.com/questions/4838507/why-does-java-allow-control-characters-in-its-identifiers

Vladimir K. wrote in post #1059387:

Actually I am a C++ developer, but sometimes uses Python.
I don’t cry, “give me something like: a = [x*x for x in values]”
All I want is to know “WHY?!” and ask when Matz planning to fix it.
Finally:

  1. Why you limit FIRST character of class by A-Z for UTF-8 oriented
    language?
  2. When you planning to fix it for Cyrillic and Greek class names?

Is it really so hard to understand? Do we really have to repeat what we
already said several times before?

Sorry, but I have to agree with Marc that you begin to sound like a
troll. Nobody is that ignorant.

You’ve got the wrong guys, anyway. We don’t make the decisions.