Error: Mongrel timed out this thread: too many open files

emmett · May 29, 2008, 10:08pm

I just switched to Mongrel, and it’s been working much better than my
previous lighttpd/fastcgi setup. So thanks for the awesomeness.

My current problem: once or twice an hour, I get following error in
production

Mongrel timed out this thread: too many open files

I never get it in testing or on our staging server. Any ideas what would
cause that? It doesn’t appear particularly correlated with load to me,
but
I’m only receiving notifications after the fact so I can’t be sure.

Thanks,
Emmett

emmett · May 29, 2008, 11:09pm

On Thu, 29 May 2008 13:07:27 -0700
“Emmett S.” [email protected] wrote:

I’m only receiving notifications after the fact so I can’t be sure.
A couple things cause this. One is that the mongrel is overloaded with
too many connections so it can’t accept any more.

If there’s isn’t that much load on the server, then it’s more likely
that you are leaking an open file here or there. If you are doing code
like this:

a = open(“blah.txt”)
a.write(“hi”)
a.close()

Then you are probably leaking files. Look for that, and then translate
to the block form:

open(“blah.txt”) {|a| a.write(“hi”) }

That’s probably the #1 mistake people make from other languages.

–
Zed A. Shaw

Hate: http://savingtheinternetwithhate.com/
Good: http://www.zedshaw.com/
Evil: http://yearofevil.com/

emmett · June 1, 2008, 4:12am

Emmett,

Contrary to what Zed’s message seems to imply, there is nothing
inherently wrong with codling like:

a = FIle.open(“blah.txt”)
a.write(“hi!”)
a.close()

You simply need to understand that if any error occurs during
a.write(…) or a similar call then a.close will not be invoked. If
you use error handling like

a = FIle.open(“blah.txt”)
begin
a.write(“hi!”)
ensure
a.close()
end

then you will ensure that the file is actually closed regardless of an
exception. Of course a block like that is kind of ugly, so it’s better
to do what Zed suggested and actually associate a code block with the
open call. This means that even if the block faults the file is
closed; it’s just a cleaner syntax.

Here are some links that kind of explains it too:

http://www.meshplex.org/wiki/Ruby/File_handling_Input_Output
http://www.math.hokudai.ac.jp/~gotoken/ruby/ruby-uguide/uguide25.html

– Brian

On Thu, May 29, 2008 at 5:02 PM, Zed A. Shaw [email protected]
wrote:

–
Zed A. Shaw

Hate: http://savingtheinternetwithhate.com/

Good: http://www.zedshaw.com/

Evil: http://yearofevil.com/

Mongrel-users mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/mongrel-users

–

/* insert witty comment here */

emmett · June 1, 2008, 8:14am

On Sat, 31 May 2008 21:03:21 -0700
“Emmett S.” [email protected] wrote:

Observe that updates and inserts in the database (postgres) are becoming
slow. And by slow, I mean 30-40 seconds for a simple insert or update where
it previously took less than 0.1 seconds. Load on DB server itself remains
nominal; less than 2 on an 8 core box. No error messages of importance that
I can see. Inserts and updates from other sources (script/console, psql) are
fast.

Well, it sounds like your site already has some traffic. Without
getting into a remote debugging session, have you checked your indexes
to make sure you’re adding the right ones to the right columns?

If you were say entering a ton of strings into a DB and then querying
for them with insane LIKE clauses, you’d see this kind of behavior. As
you added more rows your app would get slower and slower.

–
Zed A. Shaw

Hate: http://savingtheinternetwithhate.com/
Good: http://www.zedshaw.com/
Evil: http://yearofevil.com/

emmett · June 1, 2008, 6:05am

Looks like I was overloading the mongrels with connections…I took down
the
number of connections allowed in HAProxy and it looks like the problem
went
away. So,
thanks!

This has uncovered a new problem though, one that’s truly baffling me:

Start up mongrel instances. Everything is awesome. Site is fast, life
is
good.
Wait 30-40 minutes.
Observe that updates and inserts in the database (postgres) are
becoming
slow. And by slow, I mean 30-40 seconds for a simple insert or update
where
it previously took less than 0.1 seconds. Load on DB server itself
remains
nominal; less than 2 on an 8 core box. No error messages of importance
that
I can see. Inserts and updates from other sources (script/console, psql)
are
fast.

This started happening just after switching from fcgi to mongrels. Could
it
be something is different about how it handles database connections? Was
I
relying on some kind of bug before?

E

On Sat, May 31, 2008 at 7:11 PM, Brian W. [email protected]

emmett · June 1, 2008, 11:24pm

At first, I thought I’d messed up something in the database too. But
running
the exact same updates and inserts against the production database,
through the console, yields normal, fast results. The only place I see
these 30-40 second updates/inserts is from mongrels that have been under
load for a while; I don’t see the slowness when running the exact same
things from console, or from the old FCGI setup.

What could be different about the doing the database queries in Mongrel
that
could cause this? I’m not too clear on exactly how Mongrel differs from
FCGI, other than being faster and not using FCGI (the protocol). Could
it be
possible that the database connections are longer lived, or somehow
shared
between multiple threads, or something like that? I start with the
assumption Mongrel does things the right way, and that I’ve made some
mistake in configuring my application, but I’m at a loss as to where to
start looking.

Thanks,
Emmett

emmett · June 3, 2008, 4:17pm

Tikhon Bernstam wrote:

I’ve think I’ve seen the problem you’ve described when using
acts_as_ferret

ferret DRb server

I have exactly the same problem.

Initially I was running acts_as_ferret with a DRb server in a
mongrel_cluster and it was working ok. Then, I changed a field in a
table and restarted the mongrel_cluster. It was then when it stopped
working (same error as posted). I used the backup version and dropped
the field that I created but the same happens.

In ‘development’ enviroment with a single instance of mongrel it works
though, using acts_as_ferret and DRb server.

emmett · June 3, 2008, 8:15am

Hi Emmett,

I’ve think I’ve seen the problem you’ve described when using
acts_as_ferret

ferret DRb server (though I’ll assume you aren’t actually using ferret
–
as the inimitable Engine Y. guys pointed out this weekend during one
of
their talks, ferret is a common cause of problems for their users. I
haven’t played with ferret in months btw, so this example might be
outdated,
but this example illustrates a more general problem, I think). in
this
ferret case, the problem, I believe, is that when you have some model
Foo
that uses acts_as_ferret and you call foo.save, the COMMIT on the save
transaction occurs after the ferret after_create/after_update hooks.
So
the COMMIT occurs after the call to the ferret DRb server. Normally
this
is ok, but if you are indexing large amounts of text (e.g.) or the DRb
server gets busy for whatever reason, we saw that the save transactions
can
suddenly take a long time.

The example above illustrates a more general point, I think – be
careful
with what you’re doing in your AR hooks. Again, the problem is that
when
you save your AR object, that save is wrapped in a transaction, and the
commit on that transaction occurs after the AR hooks like after_create.

To verify this, here’s a simple example:

script/generate model foo && rake db:migrate

class Foo < ActiveRecord::Base

after_create { sleep 10 }

end

then from script/console

foo = Foo.create

now watch your database – the transaction begins, but the COMMIT

doesn’t
occur until after the 10 seconds of sleep.

So what plugins are you using? And are you using any interesting AR
hooks
that could potentially take a long time (like talking to a DRb server or
uploading files to s3 as an after_create, for example)?

Best,

Tikhon Bernstam

Co-founder, Scribd.com