Unicode in Forms & Ruby Regex

I’m expecting a validate_format_of with a regex like this

/^[a-zA-Z\xC0-\xD6\xD9-\xF6\xF9-\xFF.’-\ ]*?$/

to allow many of the normal characters like ö é å to be submitted via
web form.

However, the extended characters are being rejected.

This works just fine though (which is just a-zA-Z)

/^[\x41-\x5A\x61-\x7A.’-\ ]*?$/

So, what’s the secret to using unicode character ranges in Ruby regex
or Rails validations?

It also seems to fail with full \x0000 numbers, is there limit at \xFF?


def gw
acts_as_n00b
writes_at(www.railsdev.ws)
end

On Nov 30, 2007, at 3:03 AM, Greg W. wrote:

/^[\x41-\x5A\x61-\x7A.’-\ ]*?$/

So, what’s the secret to using unicode character ranges in Ruby regex
or Rails validations?

It also seems to fail with full \x0000 numbers, is there limit at
\xFF?

OK, so now that I’ve come to recognize that unicode support in Ruby
totally blows, are there any hacks out there anywhere?

I want to:

  • allow a web site visitor to enter the “usual” extended latin
    characters into a web form

  • use a regular expression (this is where the crux of the problem is)
    to ensure that all characters in the string are allowed

  • save that data to MySQL (utf8)

  • display it with the correct characters in tact

It’s no problem to capture the text store it & redisplay it, but
without filtering/validation–which of course is not acceptable.

Is anyone doing white listed character validations like this?


def gw
acts_as_n00b
writes_at(www.railsdev.ws)
end