Regular expression to find a break in a pattern

addis_a · May 23, 2013, 6:13pm

I have a large file which lots of gibberish in and I’m trying to find
the meaningful sections.

Essentially I’ll have something like this:

To: “1313131”
From: “1313131”
random data lines

To: “1313132”
From: “1313132”
random data lines

To: “1313133”
From: “1313132”
random data lines

To: “1313134”
From: “1313134”
random data lines

What I need to do is locate the line(s) where From is different from To.
In this case, the one From “1313132” To “1313133”.

I don’t know how to do this kind of match, but I assume that Ruby has a
way?

virtuoso · May 23, 2013, 7:33pm

regex capturing with assignment. Capture your string or number in the
From field using parenthesis to capture and assign: /(\d+)/
…then compare the assigned value, which is $1, with the next string

Edited to add Ruby example:

x = ‘12345’
y = ‘2345’
if ( x =~ /(\d+)/ )
if ( y == $1 )
p “Yep!”
else
p “Nope!”
end
end

output
Nope!

virtuoso · May 23, 2013, 8:37pm

Here’s a regex that captures all the cases where To matches From:

Can’t find an easy switch to find the mismatch as you need, but maybe
it’ll
provide
a starting point.
Plus Rubular is a great resource for exploring regex

cheers

virtuoso · May 23, 2013, 8:58pm

I use Rubular a lot, it’s great!

Thanks for the ideas. I’ve decided to loop through the file using 2
variables, similarly to Derrick’s suggestion.

Nice trick with “\1” Chris, I haven’t tried using that inside the same
expression before. I’ll see whether I can use that in this instance.

I was wondering whether Ruby’s Regexp had this kind of option built in,
but I guess this scenario is more on the conditional side of
programming.

virtuoso · May 24, 2013, 6:40am

On Thu, May 23, 2013 at 11:13 AM, Joel P. [email protected]
wrote:

random data lines
Posted via http://www.ruby-forum.com/.

Here is a regex that works for your example data.

text = ’
To: “1313131”
From: “1313131”
random data lines

To: “1313132”
From: “1313132”
random data lines

To: “1313133”
From: “1313132”
random data lines

To: “1313134”
From: “1313134”
random data lines

To: “abc”
From: “def”
random data lines
’

regex = /To: “(.?)"\nFrom: "(?!\1)(.?)”$/

text.scan(regex) # => [[“1313133”, “1313132”], [“abc”, “def”]]

virtuoso · May 24, 2013, 12:53am

You can do you regex test against both contexts ^To: and ^From: and
use post_match to reveal the contents after:

~Stu

virtuoso · May 26, 2013, 7:18pm

On Fri, May 24, 2013 at 12:43 PM, Joel P. [email protected]
wrote:

Excellent! I tried negatives using (?!\1) before but I couldn’t get them
to work. Thanks for the help.

You can even get the whole line content if you like

irb(main):053:0> s.scan
%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m
=> [[“To: "1313133"\nFrom: "1313132"\nrandom data lines\n\n”,
“"1313133"”]]
irb(main):054:0>
s.scan(%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m).map(&:first)
=> [“To: "1313133"\nFrom: "1313132"\nrandom data lines\n\n”]
irb(main):055:0> puts
s.scan(%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m).map(&:first)
To: “1313133”
From: “1313132”
random data lines

Or with a block:

irb(main):057:0> s.scan
%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m
do puts $1 end;nil
To: “1313133”
From: “1313132”
random data lines

Kind regards

robert

virtuoso · May 26, 2013, 8:06pm

A group within a group, and scan with a block? I had no idea!
Ruby, you continually delight me

virtuoso · May 24, 2013, 12:43pm

Excellent! I tried negatives using (?!\1) before but I couldn’t get them
to work. Thanks for the help.

virtuoso · May 26, 2013, 8:11pm

On Sun, May 26, 2013 at 8:06 PM, Joel P. [email protected]
wrote:

A group within a group,

This is regular regular expression functionality: I don’t know a single
regexp engines with support for groups which can’t do that.

and scan with a block? I had no idea!

That is a fairly old feature of the standard lib - even in 1.8.6 - and
so
important when scanning large volumes of text.

Ruby, you continually delight me

Good!

For spec about the regexp language I find this site pretty useful
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

Kind regards

robert