Regexp help

Hello everyone,

I have a string of the form

2h 3m

or

3m 2h

or

2h 3minutes

or

2hour 3min

and so on

Is there a smart regexp one liner that could produce

[2, 3]

If anyone types just for example

2

than that should produce [2]

for any of the above input? I know that there will be an m or an h.

/Marcus

Hello

[2, 3]
If you want to get [2,3] in both cases, that will be really difficult.
As far as I know, you can only do that in C#, which has named capturing
groups. In all the other languages I know, the capturing groups are
numbered when they are found… That rules it out.

By the way, would it be difficult to implement named capturing groups
in regular expressions ? Would that interest someone ?

Cheers !

Vince

Not so difficult, but it’s not, as far as I can see, a
one liner. I am working something up at the moment
using an array of regexps.

— Vincent F. [email protected]

ah neat, Jordan, and more elegant than parsing an
arrayh of regexps:)

Marcus B. wrote:

Is there a smart regexp one liner that could produce

[2, 3]

r = Regexp.new(/(\d+)h.*(\d+)m/)
s1 = “2h 3m”
s2 = “2h 3minutes”
s3 = “2hour 3min”
m = r.match(s1)
p [m[1].to_i, m[2].to_i] # => [2, 3]
m = r.match(s2)
p [m[1].to_i, m[2].to_i] # => [2, 3]
m = r.match(s3)
p [m[1].to_i, m[2].to_i] # => [2, 3]

Regards,
Jordan

Hi,

2h 3m

2

than that should produce [2]

for any of the above input? I know that there will be an m or an h.

/Marcus

str = “2h 3m” # or somthing
str.scan(/(\d+)(\w*)/).sort_by{|x|x[1]}.collect{|x|x[0].to_i}

Regards,

Park H.

But, of course, that won’t capture “3m 2h”, like you described…

On 9/29/06, Park H. [email protected] wrote:

str = “2h 3m” # or somthing
str.scan(/(\d+)(\w*)/).sort_by{|x|x[1]}.collect{|x|x[0].to_i}

Regards,

Park H.

Nice one, thanks a lot!

/Marcus

Tom A. wrote:

But, of course, that won’t capture “3m 2h”, like you described…

True…

So:

r = Regexp.new(/(\d+)h?m?.*(\d+)m?h?/)

'Course, then you’ll have [3, 2] for the edge case rather than [2,
3]…but to get the full functionality that the OP described (including
the case where just “2” is given), you’d need fancier logic than just
regexp anyhow.

Regards,
Jordan

Vincent F. a écrit :

Is there a smart regexp one liner that could produce
re = Regexp.new(/(\d+)h.(\d+)m|(\d+)m.(\d+)h/)

Vince

And the one-liner :

$ irb

“3m
2h”.scan(/(\d+)h.(\d+)m|(\d+)m.(\d+)h/).flatten.values_at(0,1,3,2).compact
=> [“2”, “3”]

“2h
3m”.scan(/(\d+)h.(\d+)m|(\d+)m.(\d+)h/).flatten.values_at(0,1,3,2).compact
=> [“2”, “3”]

It’s possible to add .map { |i| i.to_i } at the end of this one-liner if
the result array must contain integers instead of strings.

Hello again !

[2, 3]

If you want to get [2,3] in both cases, that will be really difficult.
As far as I know, you can only do that in C#, which has named capturing
groups. In all the other languages I know, the capturing groups are
numbered when they are found… That rules it out.

Well, just to contradict myself, although this is no one-liner:

def scan(str)
re = Regexp.new(/(\d+)h.(\d+)m|(\d+)m.(\d+)h/)
if m = re.match(str)
return [m[1], m[2]] if m[1]
return [m[4], m[3]]
end
end

p scan(“2h 3m”)
p scan(“3m 2h”)

Cheers !

Vince

And the one-liner :

$ irb

“3m
2h”.scan(/(\d+)h.(\d+)m|(\d+)m.(\d+)h/).flatten.values_at(0,1,3,2).compact

That’s a nice one !

Vince

I have a string of the form
[…]

Is there a smart regexp one liner that could produce

Hello Marcus,

here’s my take on it:

times = %w{ 2hour3min 2h3minutes 3m2h 2h3m }
=> [“2hour3min”, “2h3minutes”, “3m2h”, “2h3m”]

times.map{ |t| [t[/\d+h(a-z)/].to_i, t[/\d+m(a-z)/].to_i] }
=> [[2, 3], [2, 3], [2, 3], [2, 3]]

Probably a little slower than the other solutions but perhaps easier to
grasp.

Regards
Matthias

Park H. schrieb:

str = “2h 3m” # or somthing
str.scan(/(\d+)(\w*)/).sort_by{|x|x[1]}.collect{|x|x[0].to_i}

Very nice idea, Park! I wouldn’t have thought of that. Slightly shorter:

str.scan(/(\d+)(\w)/).sort_by{|n,u|u}.map{|n,u|n.to_i}

Regards,
Pit

On Fri, 29 Sep 2006, Vincent F. wrote:

Is there a smart regexp one liner that could produce

[2, 3]

If you want to get [2,3] in both cases, that will be really difficult.
As far as I know, you can only do that in C#, which has named capturing
groups. In all the other languages I know, the capturing groups are
numbered when they are found… That rules it out.

irb> a
=> [“2h 3m”, “3m 2h”, “2h 3minutes”, “2hour 3min”, “2”]
irb> re
=> /(?=.\b(\d+)(?=h|\b))(?=.\b(\d+)m|)/
irb> a.map {|x| x.match(re).captures}
=> [[“2”, “3”], [“2”, “3”], [“2”, “3”], [“2”, “3”], [“2”, nil]]

On Sep 30, 2006, at 2:55 AM, Relm wrote:

As far as I know, you can only do that in C#, which has named
capturing
groups. In all the other languages I know, the capturing groups are
numbered when they are found… That rules it out.

Python regexps have named capturing groups. It’s extremely helpful
if you need to construct complicated patterns; because the index of
each capturing group can eaasily change when you add and remove
things in the regexp.

Tom