I have a large file which lots of gibberish in and I’m trying to find
the meaningful sections.
Essentially I’ll have something like this:
To: “1313131”
From: “1313131”
random data lines
To: “1313132”
From: “1313132”
random data lines
To: “1313133”
From: “1313132”
random data lines
To: “1313134”
From: “1313134”
random data lines
What I need to do is locate the line(s) where From is different from To.
In this case, the one From “1313132” To “1313133”.
I don’t know how to do this kind of match, but I assume that Ruby has a
way?
regex capturing with assignment. Capture your string or number in the
From field using parenthesis to capture and assign: /(\d+)/
…then compare the assigned value, which is $1, with the next string
Edited to add Ruby example:
x = ‘12345’
y = ‘2345’
if ( x =~ /(\d+)/ )
if ( y == $1 )
p “Yep!”
else
p “Nope!”
end
end
output
Nope!
Here’s a regex that captures all the cases where To matches From:
Can’t find an easy switch to find the mismatch as you need, but maybe
it’ll
provide
a starting point.
Plus Rubular is a great resource for exploring regex
cheers
I use Rubular a lot, it’s great!
Thanks for the ideas. I’ve decided to loop through the file using 2
variables, similarly to Derrick’s suggestion.
Nice trick with “\1” Chris, I haven’t tried using that inside the same
expression before. I’ll see whether I can use that in this instance.
I was wondering whether Ruby’s Regexp had this kind of option built in,
but I guess this scenario is more on the conditional side of
programming.
On Thu, May 23, 2013 at 11:13 AM, Joel P. [email protected]
wrote:
random data lines
Posted via http://www.ruby-forum.com/.
Here is a regex that works for your example data.
text = ’
To: “1313131”
From: “1313131”
random data lines
To: “1313132”
From: “1313132”
random data lines
To: “1313133”
From: “1313132”
random data lines
To: “1313134”
From: “1313134”
random data lines
To: “abc”
From: “def”
random data lines
’
regex = /To: “(.?)"\nFrom: "(?!\1)(.?)”$/
text.scan(regex) # => [[“1313133”, “1313132”], [“abc”, “def”]]
You can do you regex test against both contexts ^To: and ^From: and
use post_match to reveal the contents after:
~Stu
On Fri, May 24, 2013 at 12:43 PM, Joel P. [email protected]
wrote:
Excellent! I tried negatives using (?!\1) before but I couldn’t get them
to work. Thanks for the help.
You can even get the whole line content if you like
irb(main):053:0> s.scan
%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m
=> [[“To: "1313133"\nFrom: "1313132"\nrandom data lines\n\n”,
“"1313133"”]]
irb(main):054:0>
s.scan(%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m).map(&:first)
=> [“To: "1313133"\nFrom: "1313132"\nrandom data lines\n\n”]
irb(main):055:0> puts
s.scan(%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m).map(&:first)
To: “1313133”
From: “1313132”
random data lines
Or with a block:
irb(main):057:0> s.scan
%r{(To:\s+(“\d+”)\s*$\sFrom:\s+(?!\2).?(?=To))}m
do puts $1 end;nil
To: “1313133”
From: “1313132”
random data lines
Kind regards
robert
A group within a group, and scan with a block? I had no idea!
Ruby, you continually delight me
Excellent! I tried negatives using (?!\1) before but I couldn’t get them
to work. Thanks for the help.
On Sun, May 26, 2013 at 8:06 PM, Joel P. [email protected]
wrote:
A group within a group,
This is regular regular expression functionality: I don’t know a single
regexp engines with support for groups which can’t do that.
and scan with a block? I had no idea!
That is a fairly old feature of the standard lib - even in 1.8.6 - and
so
important when scanning large volumes of text.
Ruby, you continually delight me
Good!
For spec about the regexp language I find this site pretty useful
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
Kind regards
robert