I’m using Ruby 1.8.6, and I just discovered something rather
interesting, here is a test:
require ‘test/unit’
class TestRegexBug < Test::Unit::TestCase
def test_bug
hours = "pon-Äet"
assert(hours =~ /[Ä]et/i)
assert(hours =~ /Äet/i)
assert(hours =~ /-Äet/i)
assert(hours =~ /[cÄ]et/i)
assert(hours =~ /-[Ä]et/i)
As you can see, this only happens with unicode letters… (the last test
fails)… I’m used to the fact that //i doesn’t work for unicode chars
and I already know that you need two dots to match one of these… But
this problem is different and weirder, because what triggers it is a
minus sign before the square brackets… if you remove either the ‘-’ or
‘[]’ from the regex, it works…
Can you comment?
thank you,
On Mar 3, 2008, at 2:24 PM, D. Krmpotic wrote:
I’m using Ruby 1.8.6, and I just discovered something rather
interesting, here is a test:
require ‘jcode’
assert(hours =~ /-Äet/i)
and I already know that you need two dots to match one of these… But
this problem is different and weirder, because what triggers it is a
minus sign before the square brackets… if you remove either the ‘-’
‘[]’ from the regex, it works…
Can you comment?
thank you,
Ruby is not natively aware of unicode, but you can get all these to
pass if you give it the $KCOCDE hint.
Rob B. http://agileconsultingllc.com
[email protected]
Great info… completely forgot that this is available…
thank you
require ‘jcode’
2008/3/3, D. Krmpotic [email protected]:
As you can see, this only happens with unicode letters… (the last test
fails)… I’m used to the fact that //i doesn’t work for unicode chars
and I already know that you need two dots to match one of these… But
this problem is different and weirder, because what triggers it is a
minus sign before the square brackets… if you remove either the ‘-’ or
‘[]’ from the regex, it works…
In the regex [è] is a character class with two bytes. So
Ruby tries to match a minus followed by one of the bytes
out of “è” followed by “et”. So the regex would match
“pon-\304et” or “pon-\215et”, but not “pon-\304\215et”.