On Tue, Mar 25, 2008 at 4:40 AM, James T. [email protected]
wrote:
Forgive me for not having read the whole thread, however, there is one thing
that seems to be really important, and that is, ruby hardly ever runs the
damned GC. It certainly doesn’t do full runs nearly often enough (IMO).
There’s only one kind of garbage collection sweep. And yeah,
depending on what’s happening, GC may not run very often. That’s not
generally a problem.
Also, implicit OOMEs or GC runs quite often DO NOT affect the extensions
correctly. I don’t know what rmagick is doing under the hood in this area,
but having been generating large portions of country maps with it (and
moving away from it very rapidly), I know the GC doesn’t do “The Right
Thing”.
There should be no difference between a GC run that is initiated by
the interpreter and one that is initiated by one’s code. It ends up
calling the same thing in gc.c. Extensions can easily mismanage
memory, though, and I have a hunch about what’s happening with
rmagick.
First call of address is GC_MALLOC_LIMIT and friends. For any small script
that doesn’t breach that value, the GC simply doesn’t run. More than this,
RMagick, in it’s apparent ‘wisdom’ never frees memory if the GC never runs.
Seriously, check it out. Make a tiny script, and make a huge image with it.
Hell, make 20, get an OOME, and watch for a run of the GC. The OOME will
reach your code before the GC calls on RMagick to free.Now, add a call to GC.start, and no OOME. Despite the limitations of it
(ruby performance only IMO), most of the above experience was built up on
windows, and last usage was about 6 months ago, FYI.
My hunch is that rmagick is allocating large amounts of RAM ouside of
Ruby. It registers its objects with the interpreter, but the RAM
usage in rmagick itself doesn’t count against GC_MALLOC_LIMIT because
Ruby didn’t allocate it, so doesn’t know about it.
So, it uses huge amounts of RAM, but doesn’t use huge numbers of
objects. Thus you never trigger a GC cycle by exceeding the
GC_MALLOC_LIMIT nor by running our of object slots in the heap. I’d
have to go look at the code to be sure, but the theory fits the
behavior that is described very well.
I don’t think this is a case for building GC.foo memory management
into Mongrel, though. As I think you are suggesting, just call
GC.start yourself in your code when necessary. In a typical Rails app
doing big things with rmagick, the extra time to do GC.start at the
end of the image manipulation, in the request handling, isn’t going to
be noticable.
But that’s not really the overall point. My overall point is how to
properly handle a rails app that uses a great deal of memory during each
request. I’m pretty sure this happens in other rails applications that
don’t happen to use ‘RMagick’.Personally, I’ll simply say call the GC more often. Seriously. I mean it.
It’s not that slow, not at all. In fact, I call GC.start explicitly inside
of by ubygems.rb due to stuff I have observed before:
I completely concur with this. If there are issues with huge memory
use (most likely caused by extensions making RAM allocations outside
of Ruby’s accounting, so implicit GC isn’t triggered), just call
GC.start in one’s own code.
Now, by my reckoning (and a few production apps seem to be showing
emperically (purely emperical, sorry)) we should be calling on the GC whilst
loading up the apps. I mean come on, when are a really serious number of
temporary objects being created. Actually, it’s when rubygems loads, and
that’s the first thing that happens in, hmm, probably over 90% of ruby
processes out there.
Just as a tangent, I do this in Swiftiply. I make an explicit call to
GC.start after everything is loaded and all configs are parsed, just
to make sure execution is going into the main event loop with as much
junk cleaned out as possible.
Or whatever. It doesn’t really matter that much where you do this, or when,
it just needs to happen every now and then. More importantly, add a GC.start
to the end of environment.rb, and you will have literally half the number of
objects in ObjectSpace.
This makes sense to me.
I could also see providing a 2nd Rails handler that had some GC
management stuff in it, along with some documentation on what it
actually does or does not do, so people can make an explicit choice to
use it, if they need it. I’m still completely against throwing code
into Mongrel itself for this sort of thing. I just prefer not to
throw more things into Mongrel than we really need to, when there is
no strong argument for them being inside of Mongrel itself. GC.start
stuff is simple enough to put into one’s own code at appropriate
locations, or to put into a customized Mongrel handler if one needs
it.
Maybe this simply needs to be documented in the body of Mongrel
documentation?
Kirk H.