I’ve found an anomoly in the way Ruby handles non-greedy regular
expressions and wonder whether it’s been discussed before. A search of
the documentation and a general Internet search didn’t turn up
information on this issue.
When I want to match the first quoted string in a string such as:
“aaaaa”“bbb”“ccc”
I match the last quoted string instead. The exact characters don’t
matter.
Here’s the sample code; note that (.*?) and ([^"]+) behave the same
way–and not the way I’d expect:
str = '"aaaaa""bbb""ccc"'
str.scan(/"(.*?)"/)
puts $1
# ccc
Andy Oram
str.scan(/"([^"]+)"/)
puts $1
# ccc
str.scan(/"(.*?)"(.*)/)
puts $1
# aaaaa
Adding an extra (.*) to the end produces the result I want, but I
don’t believe it should make any difference.
Here is the equivalent Perl, which works as expected:
$str = q{“aaaaa”“bbb”“ccc”};
$str =~ /"(.*?)"/;
print $1 , “\n”;
$str =~ /"([^"]+)"/;
print $1 , “\n”;
aaaaa
$str =~ /"(.?)"(.)/;
print $1 , “\n”;
aaaaa
And the equivalent PHP:
<?php
$str = '"aaaaa""bbb""ccc"';
preg_match('/"(.*?)"/', $str, $matches);
echo $matches[1] , "\n";
// aaaaa
preg_match('/"([^"]+)"/', $str, $matches);
echo $matches[1] , "\n";
// aaaaa
preg_match('/"(.*?)"(.*)/', $str, $matches);
echo $matches[1] , "\n";
// aaaaa
?>