Basic Ruby performance

dniq · February 3, 2012, 4:47am

On Thu, Feb 2, 2012 at 8:57 PM, Jeremy B. [email protected] wrote:

never need to be re-evaluated. The regexp used above must force that
optimization off because #{} while constantly evaluated to the empty
string is technically dynamic, thus the regexp needs to be re-evaluated
in every iteration of the loop.

The o flag tells Ruby to only interpolate the first time, and then cache
the regex

s =~ / test#{} /o

dniq · February 3, 2012, 5:04am

On Fri, Feb 3, 2012 at 11:52 AM, botp [email protected] wrote:

1.939860867

x=Time.now();for a in 0…1E8;end; puts(Time.now-x)
19.276653266
note, the jump…

using plain fixnum may help, but the jump is still there

(3…8).each do |x|
t=Time.now();for a in 0…10**x;end; puts(“#{x} #{Time.now()-t}”)
end

3 0.000142406
4 0.001344933
5 0.014539207
6 0.076141941
7 0.737979205
8 7.359555691

best regards -botp

dniq · February 3, 2012, 5:08am

On 02/02/2012 09:47 PM, Josh C. wrote:

s =~ / test#{} /o
Thanks for the tip! Still, sprinkling that option around might
cause more confusion than assigning the regexp to a constant and using
that reference instead. Someone reading your code later may overlook
the dangling “o” on a complex regexp and be left wondering why the
interpolation isn’t happening like they expect.

I guess it really depends on the scope of your task though.

-Jeremy

dniq · February 3, 2012, 5:10am

On 2/2/2012 5:20 PM, Dmitry N. wrote:

slower. So I ran some basic benchmarks. Here’s one example:
a*2
Thanks!

$ cat mult.rb
#for a in 0…100000000

a*2

#end
require ‘rubygems’
require ‘inline’

class Multiply
inline do |builder|
builder.c "
long mult(int max) {
long ctr = 0;
unsigned long long result;
while (ctr < max){ result = (ctr++ * 2);}
return result;
}"
end
end

puts ARGV[0]
m = Multiply.new()
start_time = Time.now
a = m.mult(ARGV[0].to_i)
puts a.to_s
end_time = Time.now
duration = ((end_time.to_f - start_time.to_f) * 1000.0).to_i
puts “You took " + duration.to_s + " seconds.”

my linux box
model name : Intel® Core™ i5-2500 CPU @ 3.30GHz quad core
$ ruby -v
ruby 1.8.7 (2011-12-28 patchlevel 357) [x86_64-linux]

[23:05:57] rthompso@raker2>~
$ time ruby mult.rb 999999991
999999991
1999999980
You took 0 seconds.

real 0m0.059s
user 0m0.050s
sys 0m0.008s

my windows xp laptop using cygwin ruby
T7100 @ 1.80GHZ dual core
Reid.Thompson@lt-lat-4960 ~
$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]

Reid.Thompson@lt-lat-4960 ~
$ time ruby rbtest.rb 999999991
999999991
1999999980
You took 0 seconds.

real 0m0.303s
user 0m0.109s
sys 0m0.170s

dniq · February 3, 2012, 11:41am

On Fri, Feb 3, 2012 at 3:14 AM, Dmitry N. [email protected]
wrote:

sys 0m0.004s

My code was merely an example of very simple loop. Its purpose was not
to calculate something, but run through the loop, and execute
multiplication on every iteration.

Yes, and maybe it was a bad example for what you are trying to do:

My main area of development is processing of rather large amounts of
data (billions of entries, primarily processed by regular expressions,
with some statistical analysis on top, and potentially - addition of NLP
later). You have to iterate through every entry of the incoming data
(which might already be in the database, plain text file, or might be
just a “fire hose” of data pouring into the system in real time).

Question: the data needs to come from somewhere. Are you sure that
your processing is CPU bound? If it is IO bound the difference
between Perl and Ruby won’t really show. I reckon it’s better to
create a more realistic example of what you are trying to do and
measure again. (And take care to run tests between Ruby and Perl
alternating in order to prevent OS IO caching from preferring one or
the other.)

Kind regards

robert

dniq · February 3, 2012, 11:43am

On Fri, Feb 3, 2012 at 3:21 AM, Dmitry N. [email protected]
wrote:

Ryan D. wrote in post #1043813:

IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They’re throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.

The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way “qr/ test /” in Perl would do, so that
it doesn’t have to re-compile it on every iteration.

Not necessary in Ruby: regexp literals are treated specially and are
not recompiled. Usually it’s faster to do

io.each do |line|
if line =~ /foo/
end
end

than

rx = /foo/

io.each do |line|
if line =~ rx
end
end

If there is dynamic content, use /o:

input = gets

io.each do |line|
if line =~ /foo:#{input}/o
end
end

Kind regards

robert

dniq · February 3, 2012, 5:25am

On Fri, Feb 3, 2012 at 12:03 PM, botp [email protected] wrote:

(3…8).each do |x|
t=Time.now();for a in 0…10**x;end; puts(“#{x} #{Time.now()-t}”)
end

3 0.000142406
4 0.001344933
5 0.014539207
6 0.076141941
7 0.737979205
8 7.359555691

pls ignore. i think it is consistent at 10
-botp

dniq · February 3, 2012, 5:29pm

On Thu, Feb 2, 2012 at 23:03, botp [email protected] wrote:

(3…8).each do |x|
t=Time.now();for a in 0…10**x;end; puts(“#{x} #{Time.now()-t}”)
end

3 0.000142406
4 0.001344933
5 0.014539207
6 0.076141941
7 0.737979205
8 7.359555691

Ummmm… yeah… every time you tell it to do ten times as many loops,
it takes almost ten times as long. What’s so surprising?

If you tweak the code to make it say how long each loop took, you’ll
see that it actually gets FASTER at first, presumably due to assorted
constant overhead, then a tiny pinch slower (possibly due to the
switch to a different kind of number) but little enough that IMHO
that’s lost in the noise.

puts("Pwr Tot Secs uS/Loop “)
(3…8).each do |x|
limit = 10**x
start_time = Time.now()
for a in 0 … limit
# do nothing here, just timing how long the loops take
end
elapsed = Time.now() - start_time
puts(” #{x} #{elapsed} #{elapsed * 1000000.0 / limit} ")
end

will output, with a tiny bit of post-formatting:

Pwr Tot Secs uS/Loop
3 0.000292 0.292
4 0.001869 0.1869
5 0.014103 0.14103
6 0.1058 0.1058
7 1.051723 0.1051723
8 10.584073 0.10584073

Of course, as someone mentioned later, the runtime may matter; this is
MRI 1.9.3.

-Dave

dniq · February 3, 2012, 3:06pm

On Feb 2, 2012, at 8:19 PM, Dmitry N. wrote:

unreadable).
Hard to believe that this thread has gone this long without a mention of
other Ruby runtimes.

You may want to also benchmark with JRuby (jruby.org) or with Rubinius
(rubini.us). For ease of installation, you may want to consider using
“rvm” to manage your Rubies (google for it to figure out how to install
it).

cr

dniq · February 3, 2012, 6:13pm

Robert K. wrote in post #1043884:

Question: the data needs to come from somewhere. Are you sure that
your processing is CPU bound? If it is IO bound the difference
between Perl and Ruby won’t really show. I reckon it’s better to
create a more realistic example of what you are trying to do and
measure again. (And take care to run tests between Ruby and Perl
alternating in order to prevent OS IO caching from preferring one or
the other.)

Yep, I’m sure it’s CPU bound: the CPU load is at 100%. The data comes in
faster than the Ruby script can process it, unfortunately, at this
point. I’m trying to optimize it, of course, but so far Perl version
beats Ruby hands down. But there aren’t too many options at my disposal,
it seems. In my examples, the “for” seems to be the major culprit: it
alone, without ANYTHING within the loop, takes 19 seconds to execute 1E8
times. The (0…1E8).each only saves about 1 second for me. Which doesn’t
really matter - most of the loops in my scripts are “while” loops
anyway. Still, the regexps themselves run very slow. I wish Ruby used
standard Perl’s PCRE library - that would make at least regexps run as
fast as they do in Perl, and I would be able to write my scripts in Ruby

dniq · February 3, 2012, 5:40pm

Jeremy B. wrote in post #1043834:

Thank you for the advices!

If you really need performance in the end, however, you might want to
consider coding your critical code paths in something like C and then
calling those from Ruby as a direct extension or using something like
ffi to call into a DLL containing the logic. Your overall code base may
be a little messy, but sometimes the speed you need requires such a
trade-off. Hopefully, you can keep the mess limited to only a small set
of your overall application logic. Of course, the same holds true for
Perl in this regard.

Well, the performance of Perl has so far been very satisfactory. In
fact, as far as RegExps are concerned - I could barely match Perl’s
performance in C++ (and even then had to mix in some plain C code). So
far, it seems, that I’m stuck with Perl Not that it’s really a bad
thing - I’ve been developing in it since 1997, so I know it pretty well,
while I’ve only spent about 2 weeks with Ruby.

I guess I will have to wait and see if Ruby interpreter becomes more
efficient But I have to confess: I’m REALLY tempted to, in some
cases, forgo the performance in favor of handsome code

dniq · February 3, 2012, 6:17pm

Here’s a question: when I say “for a in ( 0…1E8 )” - does Ruby create
an array and populate is with values from 0 through 1E8, or does it
merely create a counter similar to “for( a = 0; a<=1e8, a++ )” ?

dniq · February 3, 2012, 6:33pm

On Fri, Feb 3, 2012 at 11:17 AM, Dmitry N. [email protected]
wrote:

Here’s a question: when I say “for a in ( 0…1E8 )” - does Ruby create
an array and populate is with values from 0 through 1E8, or does it
merely create a counter similar to “for( a = 0; a<=1e8, a++ )” ?

It might make a counter-based loop internally, but at a higher level
it translates it into an iterator using Enumerable#each. The iterator
knows how to return the next number in the range each time. It doesn’t
create an array with 1E8+1 elements.

dniq · February 3, 2012, 6:22pm

On Fri, Feb 3, 2012 at 9:40 AM, Dmitry N. [email protected]
wrote:

I guess I will have to wait and see if Ruby interpreter becomes more
efficient But I have to confess: I’m REALLY tempted to, in some
cases, forgo the performance in favor of handsome code

Do you realize that there are multiple Ruby interpreters with
different implementations and strengths? Have you tried your code with
one of the others? The two other major ones are:

JRuby – http://jruby.org/
Rubinius – http://rubini.us/

And a great tool for easily switching between Ruby runtimes:

RVM – http://beginrescueend.com/

Kirk H.

dniq · February 3, 2012, 6:43pm

On Fri, Feb 3, 2012 at 11:13 AM, Dmitry N.
[email protected]wrote:

Yep, I’m sure it’s CPU bound: the CPU load is at 100%. The data comes in

–
Posted via http://www.ruby-forum.com/.

It creates a Range, which just iterates, not an array. A more idiomatic
way
would probably be (1+10**8).times { … }

As an aside, if all the processing is happening in the loop, then it
might
make more sense that the loop just delegates work out to other processes
(e.g. parse a line or process a parsed set of data). This could be
pretty
simple if done with a thread pool in a single Ruby script (you’ll want
one
of the alternate implementations here since you’re CPU bound and MRI has
a
GIL), or as arbitrarily complex as you like.

dniq · February 3, 2012, 7:12pm

Josh C. wrote in post #1043970:

It creates a Range, which just iterates, not an array. A more idiomatic
way
would probably be (1+10**8).times { … }

Phew…

As an aside, if all the processing is happening in the loop, then it
might
make more sense that the loop just delegates work out to other processes
(e.g. parse a line or process a parsed set of data). This could be
pretty
simple if done with a thread pool in a single Ruby script (you’ll want
one
of the alternate implementations here since you’re CPU bound and MRI has
a
GIL), or as arbitrarily complex as you like.

Yeah, that’s how it works in my Perl version - it all runs on Amazon,
with workload delegated to “worker” servers in a MapReduce-like fashion,
using Redis for inter-server communication.

dniq · February 3, 2012, 6:35pm

Kirk H. wrote in post #1043963:

Do you realize that there are multiple Ruby interpreters with
different implementations and strengths? Have you tried your code with
one of the others? The two other major ones are:

JRuby – http://jruby.org/
Rubinius – http://rubini.us/

Yeah, I saw it in another post - I’ll give them a try.

Thanks!

dniq · February 3, 2012, 7:56pm

Tried rubinius and jruby. Rubinius so far is the fastest one, but still
slower than Perl. The empty “for” loop runs about as fast as “while” or
.each (10 seconds - rubinius, 3.5 seconds - Perl), although .times
takes only 6 seconds.

Regexp match using / test /.match works about the same as s =~ / test /,
and is about 5 seconds, vs. Perl’s 1.4s (1e7 repetitions), although,
seeing as there’s such a huge difference in just empty loop alone it’s
hard to tell if it’s because regexps themselves are slower in Ruby, or
if it’s because of the regexp engine…

jruby is just as slow as regular ruby 1.9.3.

dniq · February 3, 2012, 7:59pm

Curious: rubinius reports itself as 2.0.0dev ( 1.8.7 ), which is strange

Ruby 1.8.7 does not support \p{} regexps (like \p{Alnum} for example),
and rubinius does…

dniq · February 4, 2012, 2:37am

On Sat, Feb 4, 2012 at 12:27 AM, Dave A.
[email protected] wrote:

7 0.737979205
8 7.359555691

Ummmm… yeah… every time you tell it to do ten times as many loops,
it takes almost ten times as long. What’s so surprising?

i noticed the mult 10 too late. what i was emphasizing is that, given
the simple loop above, your point of acceptance should be less than
10**7. otherwise, beyond that, you’ d get unacceptable response time.
just imagine, 7 seconds! this would not be acceptable for database
apps for example w response times of less than 5 seconds.

kind regards -botp