Hi,
I am able to know if there is a substring in a string with:
if string =~ /#{word}/:
puts “match”
else
puts “do not match”
end
But I would like to count all the substring from a string…
Thanks for help
Hi,
I am able to know if there is a substring in a string with:
if string =~ /#{word}/:
puts “match”
else
puts “do not match”
end
But I would like to count all the substring from a string…
Thanks for help
Zangief I. wrote:
Hi,
I am able to know if there is a substring in a string with:
if string =~ /#{word}/:
puts “match”
else
puts “do not match”
endBut I would like to count all the substring from a string…
Thanks for help
irb(main):001:0> “one word is one word, never be three
words”.scan(/word/).length
=> 3
:-), Wolfgang Nádasi-Donner
On Nov 18, 2007 3:56 PM, Wolfgang Nádasi-Donner [email protected]
wrote:
–
Posted via http://www.ruby-forum.com/.
It is not clear what you want, I use an idiom like the following to
allow for overlap
class String
def all_subs substr, overlap = false
return scan(substr) unless overlap
(0…size).inject([]){ |r, i|
r <<
( substr.match(self[i…-1])[0] rescue nil )
}.compact
end
end
p “aaaa”.all_subs( /aaa/ )
p “aaaa”.all_subs( /aaa/, true )
too bad scan does not have an overlap parameter
HTH
Robert
Robert D. wrote:
too bad scan does not have an overlap parameter
It has the possibility
irb(main):001:0> “aaaaaa”.scan(/(?=aa)/) do
irb(main):002:1* puts “#{$~.pre_match}-#{$~[0]}-#{$~.post_match}”
irb(main):003:1> end
–aaaaaa
a–aaaaa
aa–aaaa
aaa–aaa
aaaa–aa
=> “aaaaaa”
irb(main):004:0> “aaaaaa”.scan(/(?=aa)/).length
=> 5
With the look ahead features severals things can be done.
Wolfgang Nádasi-Donner
On Nov 18, 2007 4:16 PM, Wolfgang Nádasi-Donner [email protected]
wrote:
Robert D. wrote:
too bad scan does not have an overlap parameter
It has the possibility
I am not willing to pay that price;)
but let me see, in case of sparse overlaps ( and that is often the
case ) the lookahead might be less costly than my
brute force approach, hmm, not sure, I will try
38/38 > cat substr.rb && ruby substr.rb
class String
def bf_sub substr
(0…size).inject([]){ |r, i|
r <<
( substr.match(self[i…-1])[0] rescue nil )
}.compact
end
def la_sub substr
scan /(?=#{substr})/
end
end
require ‘benchmark’
N = 1_000
A = “a” * 100
S = /aa/
B = "b " * 20
T = /b\s/
Benchmark::bmbm do |bm|
bm.report “look ahead on many occurrances” do
N.times do
A.la_sub S
end
end
bm.report “brute force on many occurrances” do
N.times do
A.bf_sub S
end
end
bm.report “look ahead on sparse occurrances” do
N.times do
B.la_sub T
end
end
bm.report “brute force on sparse occurrances” do
N.times do
B.bf_sub T
end
end
end
look ahead on many occurrances 0.150000 0.000000 0.150000 (
0.240576)
brute force on many occurrances 1.020000 0.030000 1.050000 (
1.327992)
look ahead on sparse occurrances 0.050000 0.000000 0.050000 (
0.125577)
brute force on sparse occurrances 2.720000 0.090000 2.810000 (
4.508353)
------------------------------------------------------------ total:
4.060000sec
user system total
$stdout.sub(“urranc”,“urrenc”)
Turns out lookahead is quite ok, brute force hopeless even in the case
suiting it best :(, wonder how this would scale if there
were a Regexp#match allowing for an offset.
Cheers
Robert
Robert D. wrote:
…wonder how this would scale if there were a Regexp#match allowing for an offset.
…which will come in Ruby 1.9:
C:\Dokumente und Einstellungen\wolfgang>irb19
irb(main):001:0> “abcdef”.match(/./,3)
=> #<MatchData “d”>
Wolfgang Nádasi-Donner
Thanks a lot.
On 18.11.2007 19:57, Robert D. wrote:
Great
I have written a 1.9 test and still lookahead is much better, I am
surpprised as I got rid of all fancy stuff in the
bf_sub method going back to my Regexp stuff happily again.
IMHO it is not surprising because the lookahead method uses one method
(scan) which is implemented in C while you create a lot of temporary
objects and also start scanning at every position in the string.
Usually regexp engines are implemented much smarter than we can do in
short method. I am sure you have heard of Boyer Moore and Knuth Morris
Pratt algorithms…
def bf_sub substr
r = []
size.times do |i|
s =
substr.match(self,i)
r << ( s && s[0] )
end
r.compact
end
Kind regards
robert
On Nov 18, 2007 6:29 PM, Wolfgang Nádasi-Donner [email protected]
wrote:
Robert D. wrote:
…wonder how this would scale if there were a Regexp#match allowing for an offset.
…which will come in Ruby 1.9:
C:\Dokumente und Einstellungen\wolfgang>irb19
irb(main):001:0> “abcdef”.match(/./,3)
=> #<MatchData “d”>
Great
I have written a 1.9 test and still lookahead is much better, I am
surpprised as I got rid of all fancy stuff in the
bf_sub method going back to my Regexp stuff happily again.
def bf_sub substr
r = []
size.times do |i|
s =
substr.match(self,i)
r << ( s && s[0] )
end
r.compact
end
look ahead on many occurrances 0.200000 0.000000 0.200000 (
0.308822)
brute force on many occurrances 0.530000 0.000000 0.530000 (
0.649798)
look ahead on sparse occurrances 0.070000 0.000000 0.070000 (
0.171571)
brute force on sparse occurrances 1.260000 0.000000 1.260000 (
1.517635)
------------------------------------------------------------ total:
2.060000sec
user system total
real
look ahead on many occurrances 0.200000 0.000000 0.200000 (
0.531080)
brute force on many occurrances 0.530000 0.000000 0.530000 (
0.641892)
look ahead on sparse occurrances 0.070000 0.000000 0.070000 (
0.158519)
brute force on sparse occurrances 1.240000 0.000000 1.240000 (
1.389619)
look ahead on many occurrances 0.140000 0.000000 0.140000 (
0.148194)
brute force on many occurrances 0.770000 0.020000 0.790000 (
1.049865)
look ahead on sparse occurrances 0.040000 0.000000 0.040000 (
0.059984)
brute force on sparse occurrances 2.130000 0.050000 2.180000 (
2.316258)
------------------------------------------------------------ total:
3.150000sec
user system total
real
look ahead on many occurrances 0.120000 0.000000 0.120000 (
0.147802)
brute force on many occurrances 0.770000 0.030000 0.800000 (
0.838135)
look ahead on sparse occurrances 0.040000 0.000000 0.040000 (
0.057320)
brute force on sparse occurrances 2.070000 0.070000 2.140000 (
2.398068)
Wolfgang Nádasi-Donner
Posted via http://www.ruby-forum.com/.
R.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs