For or each?

tekwiz · September 21, 2008, 2:35am

_why wrote:

You folks can argue all you want about the look of the for but
you’re forgetting the utility of having two nice choices.

Tx that’s why I said ‘for’ can be more readable - even though I know
nobody in
person aware of its existence.

One which
creates scope and one that doesn’t. Don’t let this Roodi lib boss
you around! You can make up your own mind about things.

Classic ‘lint’ comes with switches to STFU some warnings, thus making
others
more useful…

tekwiz · September 21, 2008, 5:32am

Xavier N. wrote:

It is so unlikely that an #each becomes an
#inject that…

You have probably never pair-programmed with me during a savage
refactoring session.

tekwiz · September 21, 2008, 5:41am

Phlip [email protected] wrote:

Joe Wölfel wrote:

It’s interesting that array access using ‘each’ seems to be much
faster on my machine.

Does ‘for’ reevaluate its range after each tick? That would give
‘for’ a single technical advantage over .each, in the very rare
chance you need that.

No:

class Numeric
def foo
puts “foo”
self
end
end
for i in 1…10.foo
puts i
end

prints:

foo
1
2
3
4
5
6
7
8
9
10
=> 1…10

tekwiz · September 21, 2008, 3:37am

On 20 sept. 08, at 20:10, _why wrote:

Actually for loops are faster than each. Since it doesn’t
introduce a block, there’s no extra scope created. Not much faster,
but they are used in the computer language shootout, for instance.
_why

Any thoughts on the Ruby 1.9 result? It seems like ‘each’ is much
faster than either the original for loop we started talking about, or
the non-index based one (for loop 2). You can run the code below if
you like.

Ruby 1.9 output:
Rehearsal --------------------------------------------------
for loop 1.240000 0.000000 1.240000 ( 1.240751)
for loop 2 1.180000 0.000000 1.180000 ( 1.176560)
each 0.510000 0.010000 0.520000 ( 0.512496)
----------------------------------------- total: 2.940000sec

                  user     system      total        real

for loop 1.190000 0.010000 1.200000 ( 1.197695)
for loop 2 1.180000 0.000000 1.180000 ( 1.177620)
each 0.510000 0.000000 0.510000 ( 0.508858)

Ruby 1.86 output:
Rehearsal --------------------------------------------------
for loop 2.840000 0.040000 2.880000 ( 2.880437)
for loop 2 1.660000 0.000000 1.660000 ( 1.661229)
each 1.750000 0.010000 1.760000 ( 1.755502)
----------------------------------------- total: 6.300000sec

                  user     system      total        real

for loop 2.300000 0.000000 2.300000 ( 2.307566)
for loop 2 1.660000 0.010000 1.670000 ( 1.666218)
each 1.760000 0.000000 1.760000 ( 1.760356)

require ‘rubygems’
require ‘benchmark’

a = (1…10000).to_a
Benchmark.bmbm 15 do |bench|

bench.report “for loop” do
x = 0
100.times do
for i in 0…(a.size)
x += a[i]
end
end
end

bench.report “for loop 2” do
x = 0
100.times do
for i in a
x += i
end
end
end

bench.report “each” do
x = 0
100.times do
a.each do |i|
x += i
end
end
end
end

tekwiz · September 21, 2008, 5:49pm

Phlip wrote:

‘for’ is arguably more readable. And it’s not a performance issue - I
suspect the opcodes will be the same. It is very much a technical issue.

For is generally faster because it doesn’t instantiate a new scope for
each loop. It shares the same scope as its containing block of code,
where each creates a new scope each time the block is called. I’m
reasonably sure this applies to all Ruby versions too.

Charlie

tekwiz · September 21, 2008, 5:34pm

On Sun, Sep 21, 2008 at 2:54 AM, Joe WÃ¶lfel [email protected] wrote:

each 1.760000 0.000000 1.760000 ( 1.760356)
Relative speed between for (2) and each are similar in JRuby. That 1.9
thing is intriguing.

tekwiz · September 21, 2008, 7:08pm

It’s hard to predict performance in advance. I think it’s gotten
harder as processors have become more complicated and there are more
alternatives for running code. I’m just guessing here, but maybe
blocks are so critical to ruby performance that ‘each’ has benefited
from attempts to optimize the performance of blocks in general.

-Joe

tekwiz · September 21, 2008, 5:52pm

Joe WÃ¶lfel wrote:

You can try my test if you like. I haven’t checked it carefully. But
it does seem like ‘each’ is much faster under both ruby 1.86 and ruby
1.9. The code is below.

Odd, now that I try it, each is faster than for. I’m going to have to
investigate.

JRuby:
user system total real
for loop 0.334000 0.000000 0.334000 ( 0.334033)
each 0.236000 0.000000 0.236000 ( 0.235695)

Ruby 1.8.6:
user system total real
for loop 1.050000 0.010000 1.060000 ( 1.079704)
each 0.950000 0.010000 0.960000 ( 0.982576)

Ruby 1.9:
user system total real
for loop 0.910000 0.010000 0.920000 ( 0.933510)
each 0.420000 0.000000 0.420000 ( 0.431548)

Charlie

tekwiz · September 22, 2008, 12:04pm

On Sun, Sep 21, 2008 at 07:12:06AM +0900, Phlip wrote:

tekwiz wrote:

It leaves you closer to a refactor to .map or .inject or .select or
.reject or .delete_if or .each_index or .each_with_index or …

So, it’s a code-readability issue and not a functional or complexity
issue?

‘for’ is arguably more readable. And it’s not a performance issue - I
suspect the opcodes will be the same. It is very much a technical issue.
Nope. each { } needs one new scope for each iteration while “for … in”
explicitely uses the parent scope… In the end, you create with #each
as many objects as there are in your collection - which can be a huge
performance hit in some cases. That’s why I personally use “for … in”
in
performance-critical parts of my code, to avoid unnecessary GC.

Sylvain

tekwiz · September 22, 2008, 12:08pm

On Mon, Sep 22, 2008 at 10:26 AM, Sylvain J.
[email protected] wrote:

Nope. each { } needs one new scope for each iteration while “for … in”
explicitely uses the parent scope… In the end, you create with #each
as many objects as there are in your collection - which can be a huge
performance hit in some cases.

In what sense does #each create objects? Assuming a block with just
one parameter, you mean there’s a new reference to existing objects
per iteration? May that impact that much GC?

tekwiz · September 22, 2008, 12:30pm

Phlip wrote:

_why wrote:

You folks can argue all you want about the look of the for but
you’re forgetting the utility of having two nice choices.

Tx that’s why I said ‘for’ can be more readable - even though I know
nobody in
person aware of its existence.

FWIW, the skeleton code which Rails generates uses the ‘for’ loop.

$ rails wombat
$ cd wombat
$ script/generate scaffold flurble
$ cat app/views/flurbles/index.html.erb
…
<% for flurble in @flurbles %>

.. <% end %> ...

tekwiz · September 22, 2008, 12:50pm

Sylvain J. wrote:

Nope. each { } needs one new scope for each iteration

By ‘scope’ do you mean ‘stack frame’?

Take this trivial example:

def each1
yield 1
yield 2
yield 3
end
each1 { :dummy }

The only ‘objects’ being created here are stack frames, by the yield
statements calling the block. To make it more explicit,

def each2(&blk)
blk.call(1)
blk.call(2)
blk.call(3)
end
each2 { :dummy }

Now, if you are arguing that the above code creates three objects which
need to be garbage-collected later, then you’re also arguing that the
sequence

foo(1)
foo(2)
foo(3)

creates three objects which need to be garbage-collected, and therefore
that the loop

for i in (1…3)
foo(i)
end

also creates three garbage objects.

I don’t believe that’s the case. I would imagine that the stack runs as,
well, a stack. (It’s not quite that simple when you get into creating
closures of course, but if you call a closure 1000 times, you’re not
creating 1000 new closures)

Am I missing something?

Finally, I tried some simple measurements.

$ time ruby -e ‘a = (1…5_000_000).to_a; a.each { :dummy }’
$ time ruby -e ‘a = (1…5_000_000).to_a; for i in a; :dummy; end’

Under ruby 1.8.6p114, I find the first is about 5% faster.

Under ruby 1.8.4 (Ubuntu Dapper), I find the first is about 25% faster.

This is on relatively old Pentium machines though.

tekwiz · September 22, 2008, 1:00pm

Sylvain J. wrote:

Nope. each { } needs one new scope for each iteration while “for … in”
explicitely uses the parent scope… In the end, you create with #each
as many objects as there are in your collection

P.S. Here’s a simple experiment, and I can’t see any of these
dark-matter objects that you talk about.

def countobj
count = 0
ObjectSpace.each_object(Object) { count += 1 }
count
end

def foo
:dummy
end

puts “#{countobj} objects”
GC.disable
(1…1_000_000).each { foo }
puts “#{countobj} objects”

tekwiz · September 22, 2008, 3:59pm

On Mon, Sep 22, 2008 at 07:42:24PM +0900, Brian C. wrote:

also creates three garbage objects.
Yes. Except that

collection.each do |obj|
foo(obj)
end

creates twice the amount of

for obj in collection
foo(obj)
end

which is what we are trying to compare here.

I don’t believe that’s the case. I would imagine that the stack runs as,
well, a stack. (It’s not quite that simple when you get into creating
closures of course, but if you call a closure 1000 times, you’re not
creating 1000 new closures)

Am I missing something?
Yes. I don’t know for the current version of the 1.9 VM, but currently
the interpreter does not ‘reuse’ stack frames, and therefore you do
create one object per new scope. To cut down the discussion,
ObjectSpace#each_object does not give you those, so you can’t count
them with it.

Finally, I tried some simple measurements.

$ time ruby -e ‘a = (1…5_000_000).to_a; a.each { :dummy }’
$ time ruby -e ‘a = (1…5_000_000).to_a; for i in a; :dummy; end’

Under ruby 1.8.6p114, I find the first is about 5% faster.

Under ruby 1.8.4 (Ubuntu Dapper), I find the first is about 25% faster.
Interesting. I have the same results, but (on my machine) the following
is even slower:
$ time ruby -e ‘a = (1…5_000_000).to_a; a.each { |i| :dummy }’

tekwiz · September 22, 2008, 3:51pm

On Mon, Sep 22, 2008 at 07:52:47PM +0900, Brian C. wrote:

ObjectSpace.each_object(Object) { count += 1 }
puts “#{countobj} objects”
Thanks but nope…

each_object specifically filters out objects that are internal to the
interpreter, therefore you don’t see those here. For that, you actually
need a better object-counting setup. See Ruby patches for one that I
submitted. I hope that the “new Ruby GC profiler” that have been
included in 1.9 will provide the same amount of information.

Sylvain

tekwiz · September 24, 2008, 10:37pm

Come on! Is it so hard to realize that foo.each {|i| …} is used
more simply because it fits better the ruby all-OO mindset? Like
Foo.new instead of the more common new Foo?

In search of purity, everything is an object and all actions are
method calls. It feels very much like Scheme and its (sweet)
obsession for doing everything out of functions and function
application…

Rubinists are used to postfix notation just as much as Forth users or
Lispers to prefix…

tekwiz · September 26, 2008, 2:33pm

You should look at the Matz book The Ruby P.ming Language page 137:

external or internal iterators

tekwiz · September 22, 2008, 4:16pm

Sylvain J. wrote:

To cut down the discussion,
ObjectSpace#each_object does not give you those, so you can’t count
them with it.

OK, I see that certain objects are not yielded, including T_SCOPE.

On the other hand, observe that the following program doesn’t leak
memory:

def foo
:dummy
end

GC.disable
(1…10_000_000).each { |i| foo }
puts ps auxwww | grep ruby | grep -v grep
puts “Press enter”
STDIN.gets

On the two machines I tried the RSS is 3MB, regardless of how big I make
the loop. (That’s 1.8.4 stock Ubuntu Dapper, and 1.8.6p114 compiled from
source)

So are you sure a scope is created every time round the loop - not just
on the first invocation?

The following program does consume memory:

def foo
:dummy
end

GC.disable
(1…1_000_000).each { |x|
(1…1).each { |y| foo }
}
puts ps auxwww | grep ruby | grep -v grep
puts “Press enter”
STDIN.gets

That is: I’m happy to accept that every time the inner loop starts, it
creates a new scope (since the block is a closure with a different value
of x bound to it)

But if I change the inner loop to

(1..10).each { |y| foo }

the memory consumption is the same. So whether a block is invoked once
or 10 times makes no difference to memory usage.