Count substrings from a string

Hi,

I am able to know if there is a substring in a string with:

if string =~ /#{word}/:
puts “match”
else
puts “do not match”
end

But I would like to count all the substring from a string…

Thanks for help :slight_smile:

Zangief I. wrote:

Hi,

I am able to know if there is a substring in a string with:

if string =~ /#{word}/:
puts “match”
else
puts “do not match”
end

But I would like to count all the substring from a string…

Thanks for help :slight_smile:

irb(main):001:0> “one word is one word, never be three
words”.scan(/word/).length
=> 3

:-), Wolfgang Nádasi-Donner

On Nov 18, 2007 3:56 PM, Wolfgang Nádasi-Donner [email protected]
wrote:


Posted via http://www.ruby-forum.com/.

It is not clear what you want, I use an idiom like the following to
allow for overlap

class String
def all_subs substr, overlap = false
return scan(substr) unless overlap
(0…size).inject([]){ |r, i|
r <<
( substr.match(self[i…-1])[0] rescue nil )
}.compact
end
end

p “aaaa”.all_subs( /aaa/ )
p “aaaa”.all_subs( /aaa/, true )

too bad scan does not have an overlap parameter

HTH
Robert

Robert D. wrote:

too bad scan does not have an overlap parameter

It has the possibility :slight_smile:

irb(main):001:0> “aaaaaa”.scan(/(?=aa)/) do
irb(main):002:1* puts “#{$~.pre_match}-#{$~[0]}-#{$~.post_match}”
irb(main):003:1> end
–aaaaaa
a–aaaaa
aa–aaaa
aaa–aaa
aaaa–aa
=> “aaaaaa”
irb(main):004:0> “aaaaaa”.scan(/(?=aa)/).length
=> 5

With the look ahead features severals things can be done.

Wolfgang Nádasi-Donner

On Nov 18, 2007 4:16 PM, Wolfgang Nádasi-Donner [email protected]
wrote:

Robert D. wrote:

too bad scan does not have an overlap parameter

It has the possibility :slight_smile:
I am not willing to pay that price;)
but let me see, in case of sparse overlaps ( and that is often the
case ) the lookahead might be less costly than my
brute force approach, hmm, not sure, I will try

38/38 > cat substr.rb && ruby substr.rb

vim: sw=2 ts=2 ft=ruby expandtab tw=0 nu syn:

class String
def bf_sub substr
(0…size).inject([]){ |r, i|
r <<
( substr.match(self[i…-1])[0] rescue nil )
}.compact
end

def la_sub substr
scan /(?=#{substr})/
end
end

require ‘benchmark’

N = 1_000
A = “a” * 100
S = /aa/
B = "b " * 20
T = /b\s/
Benchmark::bmbm do |bm|
bm.report “look ahead on many occurrances” do
N.times do
A.la_sub S
end
end
bm.report “brute force on many occurrances” do
N.times do
A.bf_sub S
end
end

bm.report “look ahead on sparse occurrances” do
N.times do
B.la_sub T
end
end
bm.report “brute force on sparse occurrances” do
N.times do
B.bf_sub T
end
end
end

Rehearsal

look ahead on many occurrances 0.150000 0.000000 0.150000 (
0.240576)
brute force on many occurrances 1.020000 0.030000 1.050000 (
1.327992)
look ahead on sparse occurrances 0.050000 0.000000 0.050000 (
0.125577)
brute force on sparse occurrances 2.720000 0.090000 2.810000 (
4.508353)
------------------------------------------------------------ total:
4.060000sec

                                    user     system      total 

real
look ahead on many occurrances 0.150000 0.000000 0.150000 (
0.184002)
brute force on many occurrances 0.890000 0.050000 0.940000 (
1.243384)
look ahead on sparse occurrances 0.050000 0.000000 0.050000 (
0.051307)
brute force on sparse occurrances 2.620000 0.090000 2.710000 (
3.080540)

$stdout.sub(“urranc”,“urrenc”)

Turns out lookahead is quite ok, brute force hopeless even in the case
suiting it best :(, wonder how this would scale if there
were a Regexp#match allowing for an offset.

Cheers
Robert

Robert D. wrote:

…wonder how this would scale if there were a Regexp#match allowing for an offset.

…which will come in Ruby 1.9:

C:\Dokumente und Einstellungen\wolfgang>irb19
irb(main):001:0> “abcdef”.match(/./,3)
=> #<MatchData “d”>

Wolfgang Nádasi-Donner

Thanks a lot.

On 18.11.2007 19:57, Robert D. wrote:

Great :slight_smile:

I have written a 1.9 test and still lookahead is much better, I am
surpprised as I got rid of all fancy stuff in the
bf_sub method going back to my Regexp stuff happily again.

IMHO it is not surprising because the lookahead method uses one method
(scan) which is implemented in C while you create a lot of temporary
objects and also start scanning at every position in the string.
Usually regexp engines are implemented much smarter than we can do in
short method. I am sure you have heard of Boyer Moore and Knuth Morris
Pratt algorithms… :slight_smile:

def bf_sub substr
r = []
size.times do |i|
s =
substr.match(self,i)
r << ( s && s[0] )
end
r.compact
end

Kind regards

robert

On Nov 18, 2007 6:29 PM, Wolfgang Nádasi-Donner [email protected]
wrote:

Robert D. wrote:

…wonder how this would scale if there were a Regexp#match allowing for an offset.

…which will come in Ruby 1.9:

C:\Dokumente und Einstellungen\wolfgang>irb19
irb(main):001:0> “abcdef”.match(/./,3)
=> #<MatchData “d”>

Great :slight_smile:

I have written a 1.9 test and still lookahead is much better, I am
surpprised as I got rid of all fancy stuff in the
bf_sub method going back to my Regexp stuff happily again.

def bf_sub substr
r = []
size.times do |i|
s =
substr.match(self,i)
r << ( s && s[0] )
end
r.compact
end

512/12 > ruby1.9 substr-1.9.rb
Rehearsal

look ahead on many occurrances 0.200000 0.000000 0.200000 (
0.308822)
brute force on many occurrances 0.530000 0.000000 0.530000 (
0.649798)
look ahead on sparse occurrances 0.070000 0.000000 0.070000 (
0.171571)
brute force on sparse occurrances 1.260000 0.000000 1.260000 (
1.517635)
------------------------------------------------------------ total:
2.060000sec

                                    user     system      total 

real
look ahead on many occurrances 0.200000 0.000000 0.200000 (
0.531080)
brute force on many occurrances 0.530000 0.000000 0.530000 (
0.641892)
look ahead on sparse occurrances 0.070000 0.000000 0.070000 (
0.158519)
brute force on sparse occurrances 1.240000 0.000000 1.240000 (
1.389619)

Original test
507/7 > ruby substr.rb
Rehearsal

look ahead on many occurrances 0.140000 0.000000 0.140000 (
0.148194)
brute force on many occurrances 0.770000 0.020000 0.790000 (
1.049865)
look ahead on sparse occurrances 0.040000 0.000000 0.040000 (
0.059984)
brute force on sparse occurrances 2.130000 0.050000 2.180000 (
2.316258)
------------------------------------------------------------ total:
3.150000sec

                                    user     system      total 

real
look ahead on many occurrances 0.120000 0.000000 0.120000 (
0.147802)
brute force on many occurrances 0.770000 0.030000 0.800000 (
0.838135)
look ahead on sparse occurrances 0.040000 0.000000 0.040000 (
0.057320)
brute force on sparse occurrances 2.070000 0.070000 2.140000 (
2.398068)

Wolfgang Nádasi-Donner

Posted via http://www.ruby-forum.com/.

R.