never need to be re-evaluated. The regexp used above must force that
optimization off because #{} while constantly evaluated to the empty
string is technically dynamic, thus the regexp needs to be re-evaluated
in every iteration of the loop.
The o flag tells Ruby to only interpolate the first time, and then cache
the regex
s =~ / test#{} /o
Thanks for the tip! Still, sprinkling that option around might
cause more confusion than assigning the regexp to a constant and using
that reference instead. Someone reading your code later may overlook
the dangling “o” on a complex regexp and be left wondering why the
interpolation isn’t happening like they expect.
I guess it really depends on the scope of your task though.
slower. So I ran some basic benchmarks. Here’s one example:
a*2
Thanks!
$ cat mult.rb #for a in 0…100000000
a*2
#end
require ‘rubygems’
require ‘inline’
class Multiply
inline do |builder|
builder.c "
long mult(int max) {
long ctr = 0;
unsigned long long result;
while (ctr < max){ result = (ctr++ * 2);}
return result;
}"
end
end
My code was merely an example of very simple loop. Its purpose was not
to calculate something, but run through the loop, and execute
multiplication on every iteration.
Yes, and maybe it was a bad example for what you are trying to do:
My main area of development is processing of rather large amounts of
data (billions of entries, primarily processed by regular expressions,
with some statistical analysis on top, and potentially - addition of NLP
later). You have to iterate through every entry of the incoming data
(which might already be in the database, plain text file, or might be
just a “fire hose” of data pouring into the system in real time).
Question: the data needs to come from somewhere. Are you sure that
your processing is CPU bound? If it is IO bound the difference
between Perl and Ruby won’t really show. I reckon it’s better to
create a more realistic example of what you are trying to do and
measure again. (And take care to run tests between Ruby and Perl
alternating in order to prevent OS IO caching from preferring one or
the other.)
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They’re throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.
The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way “qr/ test /” in Perl would do, so that
it doesn’t have to re-compile it on every iteration.
Not necessary in Ruby: regexp literals are treated specially and are
not recompiled. Usually it’s faster to do
io.each do |line|
if line =~ /foo/
end
end
than
rx = /foo/
io.each do |line|
if line =~ rx
end
end
If there is dynamic content, use /o:
input = gets
io.each do |line|
if line =~ /foo:#{input}/o
end
end
Ummmm… yeah… every time you tell it to do ten times as many loops,
it takes almost ten times as long. What’s so surprising?
If you tweak the code to make it say how long each loop took, you’ll
see that it actually gets FASTER at first, presumably due to assorted
constant overhead, then a tiny pinch slower (possibly due to the
switch to a different kind of number) but little enough that IMHO
that’s lost in the noise.
puts("Pwr Tot Secs uS/Loop “)
(3…8).each do |x|
limit = 10**x
start_time = Time.now()
for a in 0 … limit
# do nothing here, just timing how long the loops take
end
elapsed = Time.now() - start_time
puts(” #{x} #{elapsed} #{elapsed * 1000000.0 / limit} ")
end
unreadable).
Hard to believe that this thread has gone this long without a mention of
other Ruby runtimes.
You may want to also benchmark with JRuby (jruby.org) or with Rubinius
(rubini.us). For ease of installation, you may want to consider using
“rvm” to manage your Rubies (google for it to figure out how to install
it).
Question: the data needs to come from somewhere. Are you sure that
your processing is CPU bound? If it is IO bound the difference
between Perl and Ruby won’t really show. I reckon it’s better to
create a more realistic example of what you are trying to do and
measure again. (And take care to run tests between Ruby and Perl
alternating in order to prevent OS IO caching from preferring one or
the other.)
Yep, I’m sure it’s CPU bound: the CPU load is at 100%. The data comes in
faster than the Ruby script can process it, unfortunately, at this
point. I’m trying to optimize it, of course, but so far Perl version
beats Ruby hands down. But there aren’t too many options at my disposal,
it seems. In my examples, the “for” seems to be the major culprit: it
alone, without ANYTHING within the loop, takes 19 seconds to execute 1E8
times. The (0…1E8).each only saves about 1 second for me. Which doesn’t
really matter - most of the loops in my scripts are “while” loops
anyway. Still, the regexps themselves run very slow. I wish Ruby used
standard Perl’s PCRE library - that would make at least regexps run as
fast as they do in Perl, and I would be able to write my scripts in Ruby
If you really need performance in the end, however, you might want to
consider coding your critical code paths in something like C and then
calling those from Ruby as a direct extension or using something like
ffi to call into a DLL containing the logic. Your overall code base may
be a little messy, but sometimes the speed you need requires such a
trade-off. Hopefully, you can keep the mess limited to only a small set
of your overall application logic. Of course, the same holds true for
Perl in this regard.
Well, the performance of Perl has so far been very satisfactory. In
fact, as far as RegExps are concerned - I could barely match Perl’s
performance in C++ (and even then had to mix in some plain C code). So
far, it seems, that I’m stuck with Perl Not that it’s really a bad
thing - I’ve been developing in it since 1997, so I know it pretty well,
while I’ve only spent about 2 weeks with Ruby.
I guess I will have to wait and see if Ruby interpreter becomes more
efficient But I have to confess: I’m REALLY tempted to, in some
cases, forgo the performance in favor of handsome code
Here’s a question: when I say “for a in ( 0…1E8 )” - does Ruby create
an array and populate is with values from 0 through 1E8, or does it
merely create a counter similar to “for( a = 0; a<=1e8, a++ )” ?
Here’s a question: when I say “for a in ( 0…1E8 )” - does Ruby create
an array and populate is with values from 0 through 1E8, or does it
merely create a counter similar to “for( a = 0; a<=1e8, a++ )” ?
It might make a counter-based loop internally, but at a higher level
it translates it into an iterator using Enumerable#each. The iterator
knows how to return the next number in the range each time. It doesn’t
create an array with 1E8+1 elements.
I guess I will have to wait and see if Ruby interpreter becomes more
efficient But I have to confess: I’m REALLY tempted to, in some
cases, forgo the performance in favor of handsome code
Do you realize that there are multiple Ruby interpreters with
different implementations and strengths? Have you tried your code with
one of the others? The two other major ones are:
It creates a Range, which just iterates, not an array. A more idiomatic
way
would probably be (1+10**8).times { … }
As an aside, if all the processing is happening in the loop, then it
might
make more sense that the loop just delegates work out to other processes
(e.g. parse a line or process a parsed set of data). This could be
pretty
simple if done with a thread pool in a single Ruby script (you’ll want
one
of the alternate implementations here since you’re CPU bound and MRI has
a
GIL), or as arbitrarily complex as you like.
It creates a Range, which just iterates, not an array. A more idiomatic
way
would probably be (1+10**8).times { … }
Phew…
As an aside, if all the processing is happening in the loop, then it
might
make more sense that the loop just delegates work out to other processes
(e.g. parse a line or process a parsed set of data). This could be
pretty
simple if done with a thread pool in a single Ruby script (you’ll want
one
of the alternate implementations here since you’re CPU bound and MRI has
a
GIL), or as arbitrarily complex as you like.
Yeah, that’s how it works in my Perl version - it all runs on Amazon,
with workload delegated to “worker” servers in a MapReduce-like fashion,
using Redis for inter-server communication.
Do you realize that there are multiple Ruby interpreters with
different implementations and strengths? Have you tried your code with
one of the others? The two other major ones are:
Tried rubinius and jruby. Rubinius so far is the fastest one, but still
slower than Perl. The empty “for” loop runs about as fast as “while” or
.each (10 seconds - rubinius, 3.5 seconds - Perl), although .times
takes only 6 seconds.
Regexp match using / test /.match works about the same as s =~ / test /,
and is about 5 seconds, vs. Perl’s 1.4s (1e7 repetitions), although,
seeing as there’s such a huge difference in just empty loop alone it’s
hard to tell if it’s because regexps themselves are slower in Ruby, or
if it’s because of the regexp engine…
Ummmm… yeah… every time you tell it to do ten times as many loops,
it takes almost ten times as long. What’s so surprising?
i noticed the mult 10 too late. what i was emphasizing is that, given
the simple loop above, your point of acceptance should be less than
10**7. otherwise, beyond that, you’ d get unacceptable response time.
just imagine, 7 seconds! this would not be acceptable for database
apps for example w response times of less than 5 seconds.