Do One wrote:
Yes it was fixed yesterday with two consecutive patches, first one was
not fixing it completely, but before I found how to reproduce a bug it
is got fixed second time. (ruby 1.9.2 svn trunk)
It looks like this craziness is core behaviour for ruby 1.9,
unfortunately.
Notice that in your script which reproduces the problem, the encodings
of the two strings match. Results shown are for ruby 1.9.1p0 (2009-01-30
revision 21907) [i686-linux]
#encoding: utf-8
a = “a”
b = a.tr(“z”, “\u0101”)
h = {a => 1}
p h.key?(a) #true
p h.key?(b) #false !!
p a #“a”
p b #“a”
p a.encoding #Encoding:UTF-8
p b.encoding #Encoding:UTF-8
p a == b #true
p a.hash #137519702
p b.hash #137519703 AHA!
So two strings, with identical byte sequences and identical encodings,
calculate different hashes. So there must be some hidden internal state
in the string which affects the calculation of the hash. I presume this
is the flag ENC_CODERANGE_7BIT.
It’s hard to test whether this flag has been set correctly, if
String#encoding doesn’t show it, so you have to use indirect methods
like String#hash.
But now I think I understand the problem, it’s easy to find more
examples of the same brokenness. Here’s one:
#encoding: utf-8
a = “a”
b = “aß”
b = b.delete(“ß”)
h = {a => 1}
p h.key?(a) #true
p h.key?(b) #false !!
p a #“a”
p b #“a”
p a.encoding #Encoding:UTF-8
p b.encoding #Encoding:UTF-8
p a == b #true
p a.hash #-590825394
p b.hash #-590825393
I wonder just how many other string methods are broken in this way? And
how many extension writers are going to set this hidden flag correctly
in their strings, if even the ruby core developers don’t always do it?
It looks like this flag is a bad optimisation.
-
It needs recalculating every time a string is modified (thus negating
the benefits of the optimisation)
-
It introduces hidden state, which affects behaviour but cannot be
directly tested
-
If the state is not set correctly every time a string is generated
or modified - and this includes in all extension modules - then things
break.
Regards,
Brian.