FasterGenerator (#66)

On 2/11/06, Jacob F. [email protected] wrote:

On 2/10/06, Ruby Q. [email protected] wrote:

This week’s Ruby Q. is to write FasterGenerator, your own
re-implementation of Generator with an eye towards working faster.

Here’s the benchmark results from my own implementation:

Ok, since last night I’ve corrected my implementation to not evaluate
ahead, as per H. Yamamoto’s and Luke B.'s comments. I had
originally done this, then removed it for time savings. For
correctness (which Luke’s example brought to mind), I’ve reintroduced
it.

I also made the changes to the benchmark that Christoffer Lernö
suggested. Due to the slowness of the current generator (more
specifically, the current generator on my computer) I had to drop the
test constants in order to get it to finish in a reasonable time.

Here are my current benchmark results, with my benchmark code and
current implementation attached:

galadriel:~/ruby/qotw/66$ ruby benchmark.rb

Construction

Rehearsal -------------------------------------------------------------
Old callcc Generator 0.070000 0.140000 0.210000 ( 0.323153)
lukfugl’s FasterGenerator 0.010000 0.000000 0.010000 ( 0.003346)
---------------------------------------------------- total: 0.220000sec

                            user     system      total        real

Old callcc Generator 0.140000 0.090000 0.230000 ( 0.354175)
lukfugl’s FasterGenerator 0.000000 0.000000 0.000000 ( 0.004274)

next()

Rehearsal -------------------------------------------------------------
Old callcc Generator 300.130000 182.660000 482.790000 (495.950825)
lukfugl’s FasterGenerator 5.550000 0.050000 5.600000 ( 5.751610)
-------------------------------------------------- total: 488.390000sec

                            user     system      total        real

Old callcc Generator 225.030000 185.010000 410.040000 (425.286992)
lukfugl’s FasterGenerator 5.690000 0.020000 5.710000 (
5.821117)

Jacob F.

On Feb 12, 2006, at 11:10 AM, Jacob F. wrote:

Ok, since last night I’ve corrected my implementation to not evaluate
ahead, as per H. Yamamoto’s and Luke B.'s comments.

Mine does, of course, pull all the items in at once and has all the
problems discussed with that approach. Attached is my implementation
and the altered versions I have been using in benchmarks.

James Edward G. II

On Feb 12, 2006, at 19:23 , James Edward G. II wrote:

James Edward G. II
Mine looks like a clone of James’:

class MyGenerator
attr_reader :index
def initialize(enum = nil)
if enum then
@array = enum.to_a
else
@array = Array.new
yield self
end
@index = 0
end
def current
raise EOFError unless next?
@array[@index]
end
def next
value = current
@index += 1
return value
end
def next?
return @index < @array.length
end
def rewind
@index = 0
self
end
def each(&block)
@array.each(&block)
end
def yield(value)
@array << value
end
def pos
return @index
end
def end?
return !next?
end
end

/Christoffer

Jacob F. wrote:

end

expect, and it has nothing to do with an initial advance or anything.
be done.

You are correct. My bad.

My point is that the API is pointlessly difficult. I think, actually,
that the C# iterator API is the cleanest I’ve seen: there is one
operation that attempts to advance to the next position, returning false
if it has reached the end, and another that returns the current value,
throwing an exception if it’s still before the first advance.

And my other point is that, while you’re going down the road of
coroutines, you might as well go the whole way.

Luke B.

On Mon, 2006-02-13 at 02:10 +0900, Jacob F. wrote:

Ok, since last night I’ve corrected my implementation to not evaluate
ahead, as per H. Yamamoto’s and Luke B.'s comments.

I did the same, and it’s pretty much killed my implementation :(. It
turns out to be quite similar to the new standard generator
implementation, but I didn’t look beforehand, I promise :slight_smile: I’ve used
this kind of setup before in Java. Previously I had it running the block
in variable-size chunks with optimisation for non-block usage, when an
enum that responds_to? :length is supplied to the constructor. Like in
those performance tests :wink:

Anyway, it now satisfies the extra conditions discussed I think, but
it’s performance isn’t too hot anymore. Particularly on 1.8, threading
seems to be fairly slow anyway. I also ran it under 1.9 against the 1.9
generator for comparison (it’s a sliver quicker, but I suspect that’s
because my one isn’t very robust). These numbers are with 100 tests,
1000 elements as used in Jacob’s benchmark.

(Incidentally, Jacob, Under 1.8 yours was testing about twice as fast as
mine, but it seems to deadlock about half of the time? I have to
interrupt after leaving it for up to a minute. I’m on Ruby 1.8.4
i686-linux).

$ ruby bench.rb

Construction

Rehearsal --------------------------------------------------------------
1.8.4-2005-12-24 Generator 0.020000 0.010000 0.030000 ( 0.111110)
My generator 0.010000 0.000000 0.010000 ( 0.065179)
----------------------------------------------------- total: 0.040000sec

                             user     system      total        real

1.8.4-2005-12-24 Generator 0.000000 0.010000 0.010000 ( 0.015034)
My generator 0.010000 0.000000 0.010000 ( 0.010767)

next()

Rehearsal --------------------------------------------------------------
1.8.4-2005-12-24 Generator 19.640000 0.420000 20.060000 ( 22.134238)
My generator 10.720000 0.090000 10.810000 ( 11.349895)
---------------------------------------------------- total: 30.870000sec

                             user     system      total        real

1.8.4-2005-12-24 Generator 19.310000 0.250000 19.560000 ( 20.453355)
My generator 9.950000 0.060000 10.010000 ( 10.507884)

$ ruby9 bench.rb

Construction

Rehearsal --------------------------------------------------------------
1.9.0-2006-02-13 Generator 0.000000 0.000000 0.000000 ( 0.057187)
My generator 0.010000 0.000000 0.010000 ( 0.005367)
----------------------------------------------------- total: 0.010000sec

                             user     system      total        real

1.9.0-2006-02-13 Generator 0.000000 0.000000 0.000000 ( 0.003844)
My generator 0.010000 0.000000 0.010000 ( 0.004790)

next()

Rehearsal --------------------------------------------------------------
1.9.0-2006-02-13 Generator 3.780000 0.030000 3.810000 ( 4.021680)
My generator 3.610000 0.020000 3.630000 ( 3.758884)
----------------------------------------------------- total: 7.440000sec

                             user     system      total        real

1.9.0-2006-02-13 Generator 3.950000 0.030000 3.980000 ( 4.021709)
My generator 3.590000 0.030000 3.620000 ( 3.762839)

This submission:

  • passes the tests
  • runs a couple orders faster than the continuation generator using
    the provided bench
  • runs the generator block in a separate thread
  • intended to handle both stateful and stateless (functional) generator
    blocks
  • provides a Generator::stateless constructor that allows for the
    generator block to run for more than one callback to Generator#yield
  • is uncommented

thanks,
Dave

Hi,

In message “Re: [QUIZ] FasterGenerator (#66)”
on Fri, 10 Feb 2006 22:53:06 +0900, Ruby Q.
[email protected] writes:

|This week’s Ruby Q. is to write FasterGenerator, your own re-implementation of
|Generator with an eye towards working faster. (This is a small library. My
|version is 54 lines.) It is possible to go even faster than the new
|implementation, with certain trade-offs:

I’d be happy to replace current implementation of generator.rb with
the winning one.

						matz.

Here’s my solution for this week’s quiz. Normally, I just watch the
quizzing, but this one happens to be one that I’m already doing, as
part of a larger library of external iterators.

The key insight is to realize that Continuations were used in the
original because an independant call stack is needed (to run #each
in), separate from the call stack of the user of Generator. However,
Continuations are only one way to get another call stack; you could
also use a Thread, which doesn’t have the same performance problems
in ruby.

In order to implement Generator#current, I had to invent Queue#peek,
which tells you what the next element is without taking it from the
Queue. Actually, I’m using a SizedQueue, not a plain Queue. Otherwise,
the memory used by the queue could grow without bounds.

#rewind was also a small challenge, until I realized that you could
just restart the Thread. (Hopefully, the enum or block will return
the same results the second time through.)

It’s not allowed in the original, but this version permits you to
pass both an enum and a block to the generator. The results of the
block are passed up once the enum is exhausted.

Another interesting new feature is Generator#<<, which allows you to
add more items to the Generator once it has been created. This was
originally a misunderstood implementation of yield.

It’s clear that this version is faster than the original callcc-
based Generator, but I’m not sure how to compare with James’
results. I was unable to run his benchmark to completion on my
machine. Somewhat modified, I see differences of around 3 orders
of magnitude, but performance of the callcc-based version seems
non-linearly dependant on input size.

I also found that bumping the queue size up to 400 from the original
32 made about a 4x difference in running time. I guess context
switches are expensive…

I am curious to see how James made his own solution so blazing fast.

The requirement for synchronous generators took me by surprise at
the last minute. It was easy enough to add synchronicity, but it
looks like there would be a performance cost from the extra
context switches. And is maybe not needed in the majority of cases.
So, I put the synchronous capability in a subclass. Sure enough,
when I ran the benchmark, the synchronous version pretty much
wipes out any performance gain from the bigger queue.

Benchmarks: (take with a grain of salt)

Construction

Rehearsal

Caleb’s Generator 0.020000 0.000000 0.020000 (
0.015167)
Caleb’s Synchronous Generator 0.000000 0.000000 0.000000 (
0.003251)
Old callcc Generator 0.000000 0.000000 0.000000 (
0.004067)
-------------------------------------------------------- total:
0.020000sec

                                user     system      total 

real
Caleb’s Generator 0.010000 0.000000 0.010000 (
0.014414)
Caleb’s Synchronous Generator 0.010000 0.000000 0.010000 (
0.003384)
Old callcc Generator 0.000000 0.000000 0.000000 (
0.004027)

next()

Rehearsal

Caleb’s Generator 0.050000 0.000000 0.050000 (
0.092768)
Caleb’s Synchronous Generator 0.270000 0.000000 0.270000 (
0.306566)
each 0.000000 0.000000 0.000000 (
0.000732)
Old callcc Generator 8.410000 0.960000 9.370000 (
10.738060)
-------------------------------------------------------- total:
9.690000sec

                                user     system      total 

real
Caleb’s Generator 0.030000 0.000000 0.030000 (
0.069023)
Caleb’s Synchronous Generator 0.200000 0.000000 0.200000 (
0.256574)
each 0.000000 0.000000 0.000000 (
0.000392)
Old callcc Generator 7.400000 0.960000 8.360000 (
8.679449)

On 2/12/06, Dave L. [email protected] wrote:

This submission:

  • passes the tests
  • runs a couple orders faster than the continuation generator using
    the provided bench
  • runs the generator block in a separate thread
  • intended to handle both stateful and stateless (functional) generator blocks
  • provides a Generator::stateless constructor that allows for the
    generator block to run for more than one callback to Generator#yield
  • is uncommented

forgot one thing:

  • intended to handle infinite generators.

Dave

I’d been wanting to mess with writing ruby extensions so I wrote the
whole thing in c. I do realize one thing already: I’m using the
‘entries’ method to pull out what to iterate over instead of each. This
comes from something that I’m totally clear on: The contract for the
method is that we receive in an Enumrable object so can we assume that
this method will be implemented since it’s part of the Enumerable
mixin? Anyways here’s the code:

#include “ruby.h”

static ID id_entries;
typedef struct _gen
{
int curr;
VALUE values;
} Generator;

//Is there a built in way to do this?
#define TEST(t) t?Qtrue:Qfalse

static VALUE t_init(int argc, VALUE argv, VALUE self)
{
Generator
gen;
Data_Get_Struct(self, Generator, gen);
if(argc > 0)
{
VALUE arr = rb_funcall(argv[0], id_entries, 0);
gen->values = arr;
}
else
{
VALUE arr = rb_ary_new();
gen->values = arr;
rb_yield(self);
}
gen->curr = 0;
return self;
}

static VALUE t_end_q(VALUE self)
{
Generator* gen;
Data_Get_Struct(self, Generator, gen);
int size = RARRAY(gen->values)->len;
int curr = gen->curr;
return TEST(curr >= size);
}

static VALUE t_next_q(VALUE self)
{
return TEST(!t_end_q(self));
}

static VALUE t_pos(VALUE self)
{
Generator* gen;
Data_Get_Struct(self, Generator, gen);
int curr = gen->curr;
return INT2NUM(curr);
}

static VALUE t_rewind(VALUE self)
{
Generator* gen;
Data_Get_Struct(self, Generator, gen);
gen->curr = 0;
return self;
}

static VALUE t_yield(VALUE self, VALUE element)
{
Generator* gen;
Data_Get_Struct(self, Generator, gen);
rb_ary_push(gen->values, element);
return gen->values;
}

static VALUE t_current(VALUE self)
{
if(t_end_q(self))
{
rb_raise(rb_eEOFError, “no more elements available”);
return Qnil;
}
Generator* gen;
Data_Get_Struct(self, Generator, gen);
int curr = gen->curr;
return rb_ary_entry(gen->values, curr);
}

static VALUE t_next(VALUE self)
{
if(t_end_q(self))
{
rb_raise(rb_eEOFError, “no more elements available”);
return Qnil;
}
Generator* gen;
Data_Get_Struct(self, Generator, gen);
int curr = gen->curr++;
VALUE temp = rb_ary_entry(gen->values, curr);
return temp;
}

static VALUE t_each(VALUE self)
{
Generator* gen;
Data_Get_Struct(self, Generator, gen);
gen->curr = 0;
rb_iterate(rb_each, gen->values, rb_yield, 0);
return self;
}

static void gen_free(void* p)
{
free§;
}

static void gen_mark(void* p)
{
Generator* g = p;
rb_gc_mark(g->values);
}

static VALUE gen_alloc(VALUE klass)
{
Generator* gen = malloc(sizeof(Generator));
VALUE obj;
obj = Data_Wrap_Struct(klass, gen_mark, gen_free, gen);
return obj;
}

VALUE cGen;
void Init_generator_j()
{
id_entries = rb_intern(“entries”);
cGen = rb_define_class(“GeneratorJ”, rb_cObject);
rb_define_method(cGen, “initialize”, t_init, -1);
rb_define_method(cGen, “next?”, t_next_q, 0);
rb_define_method(cGen, “next”, t_next, 0);
rb_define_method(cGen, “end?”, t_end_q, 0);
rb_define_method(cGen, “pos”, t_pos, 0);
rb_define_method(cGen, “index”, t_pos, 0);
rb_define_method(cGen, “rewind”, t_rewind, 0);
rb_define_method(cGen, “yield”, t_yield, 1);
rb_define_method(cGen, “current”, t_current, 0);
rb_define_method(cGen, “each”, t_each, 0);

rb_define_alloc_func(cGen, gen_alloc);

}

I haven’t written much c in a long while (let alone ruby in c) so be
kind. It doesn’t seem to be much of a speed improvement at all over the
same code written in straight ruby, but it was educational for me.

I did mess with trying to get infinite generators to work, but I
cheated and looked at the code after a bit so I was tainted after that.
Here is a test case I made however if it’s of any use. (require
‘matrix’ of course):

def fib(n) (Matrix[[1,1],[1,0]]**(n-1))[0,0] end
def test_fib
    g = Generator.new do |g|
        a, b = 0, 1
        while true
            g.yield a
            a, b = b, a+b
        end
    end
    100.times do |i|
        assert_equal(i, g.pos)
        assert_equal(fib(i), g.next)
    end
end

Thanks!

-----Horndude77

On 2/12/06, Ross B. [email protected] wrote:

(Incidentally, Jacob, Under 1.8 yours was testing about twice as fast as
mine, but it seems to deadlock about half of the time? I have to
interrupt after leaving it for up to a minute. I’m on Ruby 1.8.4
i686-linux).

Hmm, while I didn’t run into any deadlocks with my testing, I’m not
too surprised. I’m not very threading-savvy and could easily have made
some critical mistake. :slight_smile:

I also realized that the implementation I posted to the list doesn’t
rewind correctly. Currently it just drops back the @position marker,
but doesn’t reset the @values array or generating block. This is bad
if the generator is rewound and the block should generate different
values on different runs (e.g. it’s time dependent or sensitive to the
environment).

Jacob F.

Here’s my solution. Mine is pretty much similar in spirit to Caleb’s
solution, except that we use different lock mechanisms. My first take
was to replace the Continuation with a thread and a SizedQueue. That is,
in the #initialize, I create a new thread with the given block (or the
one I generate with the given enum), which will eventually be blocked
when writing to the queue with the size of 1 in #yield. Once #next
dequeues the head, the #yield thread continues, etc.

This passed James’s TC_Generator test suite, but miserably failed on
Luke’s “shared state” test, although the code was supposed to handle the
case.

It turned out, if SizedQueue of size one is the only blocking mechanism,
you have two values waiting at the queue’s door; one in the queue, the
other right before the queue, waiting to be unlocked. This made the
reponse to Luke’s test 1, 1, 2, 3, 4 (and then 10, 10, 10, … if I
increase the repetition). I needed to make the thread stop right after
it enqueued the value until #next consumes it.

My solution was to get rid of SizedQueue and to use a Mutex and a
ConditionVariable to accomplish just that. At that point I saw Caleb’s
solution and thought that starting and stopping a thead should be much
slower than using Mutexes and Cond_Vars. To my surprise, that wasn’t the
case. Mutex mechanism was much slower than Caleb’s thread switching
solution.

Anyways, here’s the code. Benchmark follows the code (I ran on a slower
notebook).

Thanks James for the nice quiz.

Jesse

#!/usr/bin/env ruby
require ‘thread’

class Generator
include Enumerable

def initialize(enum = nil, &block)
if enum
@block = proc { |g|
enum.each { |x| g.yield x }
}
else
@block = block
end

@index = 0
@queue = []
@q_access = Mutex.new
@q_consumed = ConditionVariable.new

@thread = Thread.new(self, &@block)

self

end

def yield(value)
@q_access.synchronize {
@queue << value
@q_consumed.wait(@q_access)
}

self

end

def end?()
Thread.pass while @queue.empty? && @thread.alive?
@queue.empty? && !@thread.alive?
end

def next?()
!end?
end

def index()
@index
end

def pos()
@index
end

def next()
if end?
raise EOFError, “no more elements available”
end
ret = nil
@q_access.synchronize {
@index += 1
ret = @queue.shift
@q_consumed.signal
}

ret

end

def current()
if end?
raise EOFError, “no more elements available”
end

@queue.first

end

def rewind()
initialize(nil, &@block) if @index.nonzero?

self

end

def each
rewind

until end?
  yield self.next
end

self

end
end

tests = 10

enum = (1…1000).to_a

Construction

Rehearsal

Old callcc Generator 0.000000 0.000000 0.000000 (
0.003000)
Caleb’s SynchronousGenerator 0.000000 0.000000 0.000000 (
0.003000)
Jesse’s FasterGenerator 0.010000 0.010000 0.020000 (
0.016000)
------------------------------------------------------- total:
0.020000sec

                               user     system      total

real
Old callcc Generator 0.000000 0.010000 0.010000 (
0.003000)
Caleb’s SynchronousGenerator 0.000000 0.000000 0.000000 (
0.002000)
Jesse’s FasterGenerator 0.000000 0.000000 0.000000 (
0.003000)

next()

Rehearsal

Old callcc Generator 4.116000 0.270000 4.386000 (
4.438000)
Caleb’s SynchronousGenerator 1.181000 0.010000 1.191000 (
1.194000)
Jesse’s FasterGenerator 2.674000 0.000000 2.674000 (
2.831000)
------------------------------------------------------- total:
8.251000sec

                               user     system      total

real
Old callcc Generator 4.066000 0.010000 4.076000 (
4.099000)
Caleb’s SynchronousGenerator 1.212000 0.000000 1.212000 (
1.222000)
Jesse’s FasterGenerator 2.704000 0.000000 2.704000 (
2.706000)

On Feb 13, 2006, at 3:07 AM, Jesse Y. wrote:

My solution was to get rid of SizedQueue and to use a Mutex and a
ConditionVariable to accomplish just that. At that point I saw Caleb’s
solution and thought that starting and stopping a thead should be much
slower than using Mutexes and Cond_Vars. To my surprise, that
wasn’t the
case. Mutex mechanism was much slower than Caleb’s thread switching
solution.

I’m pretty sure they learned the same lesson changing the standard
library. Have a look at these commit messages:

  • lib/generator.rb: uses Mutex instead of Thread.critical.
    [ruby-dev:28184]

Then later:

Sorry, reverted. Mutex is damn slow…

:slight_smile:

James Edward G. II

On 2/13/06, Jesse Y. [email protected] wrote:

My solution was to get rid of SizedQueue and to use a Mutex and a
ConditionVariable to accomplish just that. At that point I saw Caleb’s
solution and thought that starting and stopping a thead should be much
slower than using Mutexes and Cond_Vars. To my surprise, that wasn’t the
case. Mutex mechanism was much slower than Caleb’s thread switching
solution.

A confession: I had just the tiniest peek at Jacob’s entry before I
wrote that part of mine. I wasn’t trying to, but my eyes did pass over
the phrase ‘Thread.stop’ in his code, so any credit for a clever
implementation should go to him.

On Feb 10, 2006, at 7:53 AM, Ruby Q. wrote:

This week’s Ruby Q. is to write FasterGenerator, your own re-
implementation of
Generator with an eye towards working faster.

Is anyone willing to benchmark the submitted solutions, the old
callcc generator, and the new threaded generator for me? I"m not
usually much of a fan, but it’s probably worth seeing them this time,
and I’m horribly busy right now.

A big thank you in advance to anyone who takes up the call…

James Edward G. II

James G. wrote:

I’m pretty sure they learned the same lesson changing the standard
library. Have a look at these commit messages:

  • lib/generator.rb: uses Mutex instead of Thread.critical.
    [ruby-dev:28184]

Then later:

Sorry, reverted. Mutex is damn slow…

:slight_smile:

James Edward G. II

Aha! Well, then we have a strong candidate for the next quiz, namely
BetterMutex? :slight_smile:

Jesse

On Feb 12, 2006, at 10:55 PM, Yukihiro M. wrote:

I’d be happy to replace current implementation of generator.rb with
the winning one.

I’m not sure we will do any better than the current implementation,
for the general case it needs to solve. That’s actually what sparked
the idea for this quiz. We will see what people come up with though…

James Edward G. II

On Tue, 2006-02-14 at 01:14 +0900, Jacob F. wrote:

On 2/12/06, Ross B. [email protected] wrote:

(Incidentally, Jacob, Under 1.8 yours was testing about twice as fast as
mine, but it seems to deadlock about half of the time? I have to
interrupt after leaving it for up to a minute. I’m on Ruby 1.8.4
i686-linux).

Hmm, while I didn’t run into any deadlocks with my testing, I’m not
too surprised. I’m not very threading-savvy and could easily have made
some critical mistake. :slight_smile:

It’s an easy mistake to make. When you stop threads right away you need
to wait for them in the main thread, otherwise it’s possible
(increasingly so the faster a computer you have) that the code following
the Thread.new block will be executed before the new thread gets to the
Thread.stop.

I’ve been doing some benchmarking and stuff and the following patch
seems to fix it. It passes all the tests and is pretty quick so, nice
work :))

— jfugal_faster_generator_orig.rb 2006-02-14 15:09:58.000000000
+0000
+++ jfugal_faster_generator.rb 2006-02-14 14:07:17.000000000 +0000
@@ -68,6 +68,7 @@
@block[self]
@mutex.synchronize{ @done = true }
}

  • Thread.pass until @thread.stop?
    @thread.run if @thread.status and need_value?
    Thread.pass while need_value?
    end

On Feb 14, 2006, at 10:42 AM, Ross B. wrote:

Here’s some I ran up on Ruby 1.8.4-2005-12-24, and results of test
runs
too.

We all owe Ross a huge thank you for this effort!

James Edward G. II

On Wed, 2006-02-15 at 01:42 +0900, Ross B. wrote:

I also (having too little to do, and too much time to do it in) ran
James’ original test-cases, plus the realtime test posted in the NG and
a simple endless iterator test (posted below the results). Two passed
all, two passed endless but not realtime, and the others didn’t pass
either (but still passed the basic tests of course).

Oops, it was actually three that passed endless but not realtime - I
missed the result for Caleb C.'s entry when I said that.