String iterate through regex matches with possition

aris · September 11, 2012, 4:46pm

Hi,

First of all sorry if this a duplicate question ( I have scanned through
the last answers regarding regex and didn’t get any ideas ).

I am scanning a string in order to detect correctly formed “records” in
it.
A correct record is a “SP” mark followed by “NL” marks (0 or more ) and
an
ending “EP” mark.
If we find an two EPs without a SP in the middle, two SPs without a EP
in
the middle, or a mark other than “NL” in between the SP and EP marks the
record is invalid.

“BS HD SP SP EP SP NL EP EP FT BS”

We have the following records:

SP EP
SP NL EP

I scan through them and I am able to retrieve them with:

string.scan(/(SP)\s((?:NL\s)*)(EP)/)

But I am not getting the start and end position of the match inside the
string ( which I need to retrive data from another place).

Is there any way to scan the string for matches where I get the index
possition ?

Maybe I should not even be using scan ?

Thanks for your help and time.

Regards,
V.

Vicente_B · September 11, 2012, 5:25pm

Hi,

The MatchData object in $~ has an “offset” method to retrieve the start
end end offset a capture group. However, I don’t understand why you
capture “SP” and “EP”.

string.scan /SP\s((?:NL\s)*)EP/ do
p $~.offset 1
end

Vicente_B · September 11, 2012, 8:27pm

On Tue, Sep 11, 2012 at 4:44 PM, Vicente B. [email protected] wrote:

See Jan’s reply for obtaining the position.

Maybe I should not even be using scan ?

If you need to process the content in between then you could also use
#split:

m = string.split /((?:SP)\s(?:NL\s)*EP)/

(When #split is used with capturing groups those are retained in the
resulting array.)

Kind regards

robert

Vicente_B · September 12, 2012, 9:56am

Thanks for the answers!! Going to go with Regexp.last_match

Vicente_B · September 11, 2012, 10:48pm

Vicente B. wrote in post #1075485:

Is there any way to scan the string for matches where I get the index
possition ?

str = “BS HD SP SP EP SP NL EP EP FT BS SP\tNL NL\nNL EP”

str.scan(/
SP
\s+
(?:NL\s+)*
EP
/xms) do |match|
md = Regexp.last_match
puts “#{match.inspect} => #{md.offset(0)}”
end

–output:–
“SP EP” => [9, 14]
“SP NL EP” => [15, 23]
“SP\tNL NL\nNL EP” => [33, 47]