Mongrel garbage collection

sfwalter · March 25, 2008, 4:27pm

On Tue, Mar 25, 2008 at 4:40 AM, James T. [email protected]
wrote:

Forgive me for not having read the whole thread, however, there is one thing
that seems to be really important, and that is, ruby hardly ever runs the
damned GC. It certainly doesn’t do full runs nearly often enough (IMO).

There’s only one kind of garbage collection sweep. And yeah,
depending on what’s happening, GC may not run very often. That’s not
generally a problem.

Also, implicit OOMEs or GC runs quite often DO NOT affect the extensions
correctly. I don’t know what rmagick is doing under the hood in this area,
but having been generating large portions of country maps with it (and
moving away from it very rapidly), I know the GC doesn’t do “The Right
Thing”.

There should be no difference between a GC run that is initiated by
the interpreter and one that is initiated by one’s code. It ends up
calling the same thing in gc.c. Extensions can easily mismanage
memory, though, and I have a hunch about what’s happening with
rmagick.

First call of address is GC_MALLOC_LIMIT and friends. For any small script
that doesn’t breach that value, the GC simply doesn’t run. More than this,
RMagick, in it’s apparent ‘wisdom’ never frees memory if the GC never runs.
Seriously, check it out. Make a tiny script, and make a huge image with it.
Hell, make 20, get an OOME, and watch for a run of the GC. The OOME will
reach your code before the GC calls on RMagick to free.

Now, add a call to GC.start, and no OOME. Despite the limitations of it
(ruby performance only IMO), most of the above experience was built up on
windows, and last usage was about 6 months ago, FYI.

My hunch is that rmagick is allocating large amounts of RAM ouside of
Ruby. It registers its objects with the interpreter, but the RAM
usage in rmagick itself doesn’t count against GC_MALLOC_LIMIT because
Ruby didn’t allocate it, so doesn’t know about it.

So, it uses huge amounts of RAM, but doesn’t use huge numbers of
objects. Thus you never trigger a GC cycle by exceeding the
GC_MALLOC_LIMIT nor by running our of object slots in the heap. I’d
have to go look at the code to be sure, but the theory fits the
behavior that is described very well.

I don’t think this is a case for building GC.foo memory management
into Mongrel, though. As I think you are suggesting, just call
GC.start yourself in your code when necessary. In a typical Rails app
doing big things with rmagick, the extra time to do GC.start at the
end of the image manipulation, in the request handling, isn’t going to
be noticable.

But that’s not really the overall point. My overall point is how to
properly handle a rails app that uses a great deal of memory during each
request. I’m pretty sure this happens in other rails applications that
don’t happen to use ‘RMagick’.

Personally, I’ll simply say call the GC more often. Seriously. I mean it.
It’s not that slow, not at all. In fact, I call GC.start explicitly inside
of by ubygems.rb due to stuff I have observed before:

I completely concur with this. If there are issues with huge memory
use (most likely caused by extensions making RAM allocations outside
of Ruby’s accounting, so implicit GC isn’t triggered), just call
GC.start in one’s own code.

Now, by my reckoning (and a few production apps seem to be showing
emperically (purely emperical, sorry)) we should be calling on the GC whilst
loading up the apps. I mean come on, when are a really serious number of
temporary objects being created. Actually, it’s when rubygems loads, and
that’s the first thing that happens in, hmm, probably over 90% of ruby
processes out there.

Just as a tangent, I do this in Swiftiply. I make an explicit call to
GC.start after everything is loaded and all configs are parsed, just
to make sure execution is going into the main event loop with as much
junk cleaned out as possible.

Or whatever. It doesn’t really matter that much where you do this, or when,
it just needs to happen every now and then. More importantly, add a GC.start
to the end of environment.rb, and you will have literally half the number of
objects in ObjectSpace.

This makes sense to me.

I could also see providing a 2nd Rails handler that had some GC
management stuff in it, along with some documentation on what it
actually does or does not do, so people can make an explicit choice to
use it, if they need it. I’m still completely against throwing code
into Mongrel itself for this sort of thing. I just prefer not to
throw more things into Mongrel than we really need to, when there is
no strong argument for them being inside of Mongrel itself. GC.start
stuff is simple enough to put into one’s own code at appropriate
locations, or to put into a customized Mongrel handler if one needs
it.

Maybe this simply needs to be documented in the body of Mongrel
documentation?

Kirk H.

sfwalter · March 26, 2008, 5:30am

On Tue, Mar 25, 2008 at 11:02 AM, Evan W. [email protected] wrote:

My hunch is that rmagick is allocating large amounts of RAM ouside of
Ruby. It registers its objects with the interpreter, but the RAM
usage in rmagick itself doesn’t count against GC_MALLOC_LIMIT because
Ruby didn’t allocate it, so doesn’t know about it.

It’s allocating opaque objects on the Ruby heap but not using Ruby’s
built-in malloc? That seems pretty evil.

Not really. It’s pretty common in extensions. You alloc your
structures in whatever way is appropriate for the library you are
using, then use Data_Wrap_Struct with a mark and a free function to
hook your stuff into the Ruby garbage collector.

Your objects are thus known to Ruby as Ruby objects, but you have
potentially large chunks of memory that Ruby itself knows nothing
about.

Kirk H.

sfwalter · March 25, 2008, 6:28pm

On 25 Mar 2008, at 15:26, Kirk H. wrote:

depending on what’s happening, GC may not run very often. That’s not
generally a problem.

Sure, inside ruby there’s only one kind of run, but…

There should be no difference between a GC run that is initiated by
the interpreter and one that is initiated by one’s code. It ends up
calling the same thing in gc.c. Extensions can easily mismanage
memory, though, and I have a hunch about what’s happening with
rmagick.

I just realised the obvious truth, that ruby isn’t actually running
the GC under those OOME conditions.

reach your code before the GC calls on RMagick to free.
Ruby didn’t allocate it, so doesn’t know about it.
Yup, it’s ImageMagick, un-patched and they don’t provide afaik a
callback to replace malloc, or maybe that’s an rmagick issue.

So, it uses huge amounts of RAM, but doesn’t use huge numbers of
objects. Thus you never trigger a GC cycle by exceeding the
GC_MALLOC_LIMIT nor by running our of object slots in the heap. I’d
have to go look at the code to be sure, but the theory fits the
behavior that is described very well.

Right, in fact, I think the OOME actually comes from outside of ruby
(unverified), and ruby can’t or won’t run the GC before going down. As
the free() calls inside RMagick / ImageMagick aren’t happening without
calling GC.start. The GC.start call, somewhere/how is being used to
trigger frees in the framework. Personally, this is bad design, and
the really common complaints may also suggest so, however, I don’t
know what their domain specific issues and limitations are. Maybe it’s
an ImageMagick thing.

Creating an OOME inside ruby, the interpreter calls on GC.start prior
to going down. I started talking to zenspider about this stuff, and
eventually he just pointed me at gc.c, fair enough. I still hold the
opinion that an OOME hitting the interpreter (from whatever source)
should attempt to invoke the GC. Of course, a hell of a lot of
software doesn’t check the result of a call to malloc(), tut tut.

Tool: http://ideas.water-powered.com/projects/libgreat

I don’t think this is a case for building GC.foo memory management
into Mongrel, though. As I think you are suggesting, just call
GC.start yourself in your code when necessary. In a typical Rails app
doing big things with rmagick, the extra time to do GC.start at the
end of the image manipulation, in the request handling, isn’t going to
be noticable.

Absolutely right, and yes, this is my opinion.

explicitly inside
loading up the apps. I mean come on, when are a really serious
junk cleaned out as possible.
I’ve done similar in anything that is running as a fire and forget
style daemon. You know, the kinds of things that get setup once, and
run for 1 to 20 years. There are several that I have never restarted.
No rails, though. These kinds of things I also simply don’t want to
waste the ram to silly fragmentation, the next allocation takes you up
to a registerable percentage on medium aged machines. IIRC there’s one
in my copy of analogger too, or maybe you had that in there already

I could also see providing a 2nd Rails handler that had some GC
management stuff in it, along with some documentation on what it
actually does or does not do, so people can make an explicit choice to
use it, if they need it. I’m still completely against throwing code
into Mongrel itself for this sort of thing. I just prefer not to
throw more things into Mongrel than we really need to, when there is
no strong argument for them being inside of Mongrel itself. GC.start
stuff is simple enough to put into one’s own code at appropriate
locations, or to put into a customized Mongrel handler if one needs
it.

If it wasn’t app specific I’d say put it in mongrel. It is though, and
peoples tendency to pre-optimize probably makes this pointless.

I mean the cost of doing it in a thread under eventmachine is way
higher than the ram usage costs for pure ruby apps, at least for my
pure ruby apps. 20-40mb vs. lots of req. / sec.

But then, one could check for better alternatives, like add_timer(),
etc, but that route tends towards bloat, so your original assertion of
put it in the app configuration, is what I would choose.

Maybe this simply needs to be documented in the body of Mongrel
documentation?

Maybe not even there. I think research needs to be done into the
longer running effects of the GC under real environments. I know some
people have done some (including myself), but the results are never
released in public. The GC also seems to be one of those topics, as
it’s so close to performance, where people are happy to see how high
up the wall they can go, prior to doing research.

With regard to mongrel and this stuff, it’s really not a mongrel
issue. Mongrel is a great citizen wrt the GC (at least by comparison
to a lot of other code).

Particularly bad citizens in this area include:

Every single pure ruby pdf lib I’ve seen
rubygems (by way of the spec loading semantics, not rubygems
itself, kinda (lets just say, I’d do it different, but by design, not
implementation))
rails
rmagick

sfwalter · March 26, 2008, 11:10am

At 03:41 AM 3/25/2008, [email protected] wrote:

country maps with it (and moving away from it very rapidly), I know
the GC doesn’t do “The Right Thing”.
[snip]

Hi James,

My understanding with RMagick is that it is hooking the Imagemagick
libs directly in process. As a result, memory is not always freed when
you’d expect it to be. I haven’t read up on the details, having chosen
to just use out of process image management, but you might find this
link interesting - in it, there is a claim that the latest releases of
RMagick do not in fact leak any memory and that running a full GC
manually will reclaim all memory it uses after the references are out
of scope.

http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618

Steve

sfwalter · March 26, 2008, 11:10am

At 03:41 AM 3/25/2008, [email protected] wrote:

country maps with it (and moving away from it very rapidly), I know
the GC doesn’t do “The Right Thing”.
[snip]

Hi James,

My understanding with RMagick is that it is hooking the Imagemagick
libs directly in process. As a result, memory is not always freed when
you’d expect it to be. I haven’t read up on the details, having chosen
to just use out of process image management, but you might find this
link interesting - in it, there is a claim that the latest releases of
RMagick do not in fact leak any memory and that running a full GC
manually will reclaim all memory it uses after the references are out
of scope.

http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618

Steve

sfwalter · March 26, 2008, 11:51am

On 25 Mar 2008, at 17:05, Steve M. wrote:

hood in this area, but having been generating large portions of
link interesting - in it, there is a claim that the latest releases of
RMagick do not in fact leak any memory and that running a full GC
manually will reclaim all memory it uses after the references are out
of scope.

Thank you for kindly ensuring that I got this. We actually moved
completely away from anything ImageMagick based. There really wasn’t
any sensible way to ‘fix’ it.

Whilst destroy! looks ok, even when doing what we were (high res
tiling, covering around 250 square miles), we found performance was
fine and we could avoid all allocation issues by using the crazy
thread solution (Thread.new { loop { sleep some_time; GC.start } }).

This is all good in most scenarios but then there are times when
running a framework like eventmachine, where threads (yes, even green
ones) can be total performance killers too. Mind you, under rails,
there’s always a linear reaction run, so I’m not going to speculate
more on that detail. It’s also OT for here, mostly…

http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618

Steve

Thanks again,

James.

P.S. Personally, if I was coming up against this problem today, I’d
drop out to a separate process, driven by something like background
job if under rails. If under pure ruby, I’d use drb or eventmachine +
a marshalling protocol, depending on specific requirements. The
biggest issue for our old project was hitting swap / page file. Image
rendering, when you’re already working on the per-pixel layer, is
really easy to divide up, though, so optimizing for speed is pretty
easy really.

When it comes to background concurrent scheduling, staying away from
ACID can really help, too, but that really is another topic for
another time. Lets just say, allow slack, and life will be easier if
you ever hit a silly scale. I’ve seen people trying to recover broken
ACID implementations by trawling logs, and my god, tearful.

sfwalter · March 26, 2008, 1:53pm

You’re right, ok. So the memory causing the OOM error isn’t actually
on the Ruby heap, but it can’t get freed until the opaque object gets
GC’d.

Evan