On 6/21/06, Yukihiro M. [email protected] wrote:
Can you elaborate? I don’t want to see disaster whatever it is.
matz.
Single scripts and small self-contained applications almost always
are written in the same codepage. Usually text data processing also
is done for the same codepage, that simplifies life a lot even with
current String as byte vector. So recoding is an overhead here, and
external data is only recoded on input/output in relativey small number
of well-defined places, using known subset of source and target
encodings.
In this case when you know what to expect from your file/network IO,
things
are OK.
It is also OK, when part of script is extracted and evolves to a
library,
as long as you use it in the same environment.
But let’s view a case when several third-party libraries are used, all
returning
strings with different encodings. gettext for libraries won’t solve
everything, as even externalized strings will have some particular
encoding.
E.g. localization libraries can’t fit in only ASCII.
And now calls to methods will behave like some kind of IO in respect to
encoding of passed parameters.
Number of i/o points grows drastically.
How can it be solved in consistent and reliable manner?
a) just simply declare in documentation: "Methods in these classes
require
strings to be in UTF16, you’ve been warned!!!"
So users of that code will have to remember those constrains and
enforce
encoding of their data before calling those methods. With dynamic
nature
of Ruby things will break in unexpected places. No, i dislike idea to
write:
str.enforce_encoding!(BooClass::INTERNAL_ENCODING)
b = BooClass.new(str)
b) take care in called methods to enforce encoding
def process_formatting(str)
str.enforce_encoding!(MY_INTERNAL_ENCODING)
# now it is compatible with rest of my code
# and i can do something with it
end
This is also too error-prone
And what about processing results of calls? To take care about it in
caller
code?
res_str = SomeUtil.fancy_format( str )
res_str.enforce_encoding!(MY_INTERNAL_ENCODING)
On input parameters and returned results which represent complex
structures
with some
String fields things will go even worse.
Who will ever cope with this issues?
Probably this is what Julik meant by “disaster”?
Things shouldn’t be that complicated.