java.net.SocketException: Too many open files

homerlex · March 21, 2010, 3:36am

We have 2 web servers running Rails 2.3.5 with JRuby 1.4.0 under Tomcat
5.5. They have been up for several days without issue.

This morning one of the web server starter spewing these errors and
tomcat needed to be restarted:
java.net.SocketException: Too many open files

Tonight the other server did the same thing.

Is there a know cause for this?

homerlex · March 21, 2010, 4:22am

I don’t know why it would suddenly break, but any moderate
web/application
server running under Linux usually needs its max socket/file handles
setting
bumped up. The error gets thrown when you exceed the number of allowed
file
handles, which includes sockets as well as files.

homerlex · March 22, 2010, 9:21am

It’s possible this is due to tempfiles not being cleaned up quickly
enough. Is this on a unix system? Can you lsof the to see what files
it has open?

There’s really two things that can cause this:

Leaking descriptors, eventually unable to create any new ones
System under high load and descriptors are not being cleaned up fast
enough (like if they’re not explicitly closed and the code leaves GC
to clean them up).

Charlie

On Sat, Mar 20, 2010 at 9:36 PM, Homer S. [email protected]
wrote:

–
Posted via http://www.ruby-forum.com/.

To unsubscribe from this list, please visit:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · March 22, 2010, 10:08am

Hi,

On Mon, Mar 22, 2010 at 9:21 AM, Charles Oliver N.
[email protected] wrote:

System under high load and descriptors are not being cleaned up fast
enough (like if they’re not explicitly closed and the code leaves GC
to clean them up).

And the good news is that in JRuby 1.5 we’ve improved things
considerably wrt temp files. We now clean them much better, and rails
doesn’t leave any temp files in TMP anymore.

See, for example: http://jira.codehaus.org/browse/JRUBY-4623 -
Tempfile does not clean up on GC run

Plus:

JRUBY-2282: Unclosed, unflushed IO objects might not write contents
before program exit
JRUBY-2475: JRuby and Builder::XmlMarkup (strange File interaction)

Thanks,
–Vladimir

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · March 22, 2010, 12:06pm

What specifically should I be looking for in the lsof? The ‘only’ thing
that seems suspicious is the number of lines like this:

java 28388 tomcat 509u sock 0,5 1208541
can’t identify protocol

The server that has been up the longest has 907 of these. The other
server has 407. Certainly seems like an issue.

Any thoughts on this?

homerlex · March 23, 2010, 12:34am

On Mon, Mar 22, 2010 at 6:06 AM, Homer S. [email protected]
wrote:

What specifically should I be looking for in the lsof? The ‘only’ thing
that seems suspicious is the number of lines like this:

java Â Â 28388 tomcat Â 509u Â sock Â Â Â Â Â Â Â Â 0,5 Â Â Â Â Â 1208541
can’t identify protocol

The server that has been up the longest has 907 of these. Â The other
server has 407. Â Certainly seems like an issue.

Ok, I think we should escalate this…

Try it with JRuby 1.5…might take a little hacking, but it should be
doable
Open a bug and describe any code of yours that might be opening
sockets
See if you can isolate what sockets are being kept open (i.e. what
code creates them, why are they still alive)

Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · March 24, 2010, 2:22am

We are using jruby, rails 2.3.5 and mysql, and do not see this kind of
behaviour. We are also running the app in threadsafe mode, and our
database.yml looks like this:

production:
adapter: <%= defined?(JRuby) ? “jdbcmysql” : “mysql” %>
encoding: utf8
database: database
username: username
password: password
host: database_host
pool: 15

We are using activerecord-jdbcmysql-adapter, version 0.9.3

Are you perhaps doing something odd with the mysql connections in your
app?

Albert

On Tue, Mar 23, 2010 at 1:41 PM, Homer S. [email protected]
wrote:

Charlie
to the database (on another machine). Â I can see (via netstat) the
connections at this point. Â I can duplicate consistently by waiting for
I found that I can duplicate on test machine that has MySql on the same

To unsubscribe from this list, please visit:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · March 24, 2010, 10:08am

Obvious difference here is pooled vs unpooled connections. It sounds
like after the initial set of connections times out, new connections
are being made but not pooled properly, and start to either leak
(getting held by someone) or just aren’t returned to the pool or
closed properly (causing them to stack up for GC to hopefully finalize).

This could also be a source of perf problems for apps with pooled
connections, since the cost of creating new connections for every
request is pretty high.

Charlie (mobile)

On Mar 23, 2010, at 9:22 PM, Albert R.
[email protected] wrote:

host: database_host
forum.com> wrote:

up
connections
the mysql connections to time out and then hit the site a bunch of
same
Posted via http://www.ruby-forum.com/.
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · March 23, 2010, 1:41pm

Charles Nutter wrote:

Ok, I think we should escalate this…

Try it with JRuby 1.5…might take a little hacking, but it should be
doable

Open a bug and describe any code of yours that might be opening
sockets

See if you can isolate what sockets are being kept open (i.e. what
code creates them, why are they still alive)

Charlie

I’ve been playing around trying to nail it down. I can restart tomcat
bop around the entire site and just 1 of these “sock” entries appears in
lsof (there is always 1 when I restart TC).

I can come back a little while later and with every hit for the next few
hits I will see the number of these go up. Then it will stop going up
again for a while.

Its a pretty straight forward app. The only connections being made are
to the database (on another machine). I can see (via netstat) the
connections on the db server being established and going away. On the
web server site I see (via netstat) that the connections eventually go
to CLOSE_WAIT state (and never go away). I see 10 of these connections
as I have jruby.max.runtimes set to 10.

I seems like after these connections “release” and go to CLOSE_WAIT
subsequent hits to the site increase the number of “sock” entries seen
in lsof.

I’m pretty sure the build up of these is related to the MySql
connections at this point. I can duplicate consistently by waiting for
the mysql connections to time out and then hit the site a bunch of
times. The number in lsof always goes up by the number of mysql
connections that had timed out.

I am using JDBC. My settings look like this:

  adapter: jdbc
  driver: com.mysql.jdbc.Driver
  url: jdbc:mysql://db1:3306/db_development

I found that I can duplicate on test machine that has MySql on the same
machine as the web server so it doesn’t have to do with remote
connections.

Any thoughts on this? Is it possible 1.5 could help this situation? I
will open a bug describing this as well.

homerlex · March 24, 2010, 12:17pm

Albert R. wrote:

Are you perhaps doing something odd with the mysql connections in your
app?

Albert

No, just the standard rails stuff. I set pool to 5 and retested and I
can still duplicate the problem. Are you waiting to see the SQL
connections timeout then running something like “lsof -c java |grep
protocol | wc -l” to see if the count goes up? If your site is busy and
the connections never timeout then I assume you’d never see this
problem.

Our activerecord-jdbcmysql-adapter is a tad older than yours (0.9.2). I
could try an upgrade but I’m not optimistic.

homerlex · March 29, 2010, 4:33pm

I tried the app with the latest gems and I can still repro. BTW - I
found that I can repro faster by restarting MySQL instead of waiting for
the connections to timeout.

I was going to try with JRuby 1.5 but there is a bit of a blocker. I
use warbler which depends on the jruby-jars gem. This contains the jars
for JRuby 1.4 (jruby-core-1.4.0.jar and jruby-stdlib-1.4.0.jar). Are
these jars available somewhere for 1.5?

Thanks for the help.

Charles Nutter wrote:

Well, let me put it this way…there’s a bug somewhere. Try the latest
of everything, and if you still have a problem, we should escalate it.
I suspect AR-JDBC most likely.

On Wed, Mar 24, 2010 at 6:17 AM, Homer S. [email protected]
wrote:

homerlex · June 7, 2010, 7:26pm

Has anyone had any success in solving this one?

We are running jruby 1.4.0 with the 0.9.4 jdbc adapter and are
experiencing it a couple times a week.

I just ran lsof on the box it recently happened and the following two
entries seem to be the problem we are having:

java 22279 pbyintra 1002r 0000 0,10 0 55794996
eventpoll

java 22279 pbyintra 1003r FIFO 0,6 55794997
pipe

Our open files ulimit is currently at 1024 which we are going to bump up
but of the two entries above, the pipe entry is listed 594 times and the
eventpoll entry is listed 294 times.

Anyone know what they are or have any suggestions?

Thanks
Ryan

Homer S. wrote:

I tried the app with the latest gems and I can still repro. BTW - I
found that I can repro faster by restarting MySQL instead of waiting for
the connections to timeout.

I was going to try with JRuby 1.5 but there is a bit of a blocker. I
use warbler which depends on the jruby-jars gem. This contains the jars
for JRuby 1.4 (jruby-core-1.4.0.jar and jruby-stdlib-1.4.0.jar). Are
these jars available somewhere for 1.5?

Thanks for the help.

Charles Nutter wrote:

Well, let me put it this way…there’s a bug somewhere. Try the latest
of everything, and if you still have a problem, we should escalate it.
I suspect AR-JDBC most likely.

On Wed, Mar 24, 2010 at 6:17 AM, Homer S. [email protected]
wrote:

homerlex · March 27, 2010, 2:31am

Well, let me put it this way…there’s a bug somewhere. Try the latest
of everything, and if you still have a problem, we should escalate it.
I suspect AR-JDBC most likely.

On Wed, Mar 24, 2010 at 6:17 AM, Homer S. [email protected]
wrote:

protocol | wc -l" to see if the count goes up? Â If your site is busy and
To unsubscribe from this list, please visit:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · June 7, 2010, 7:36pm

Is there a bug filed for this yet? We should try to prioritize it and
find what’s happening.

The released jruby-jars gem should have 1.5 jars in them…is that not
the case? (I’m trying to install it, but the wifi at RailsConf is a
little slow at the moment)

On Mon, Mar 29, 2010 at 2:33 PM, Homer S. [email protected]
wrote:

–
Posted via http://www.ruby-forum.com/.

To unsubscribe from this list, please visit:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · June 8, 2010, 11:21am

I can second this issue. We recently switched to trinidad, which uses
tomcat (like the ticket http://jira.codehaus.org/browse/JRUBY-4767),
and started seeing these issues. We did not see this behaviour in
glassfishv2 or when we switched to the gfgem.

We are on jruby 1.5 right now, and we have to restart our app servers
about once every 4-5 days or we get the too_many_open_files
exceptions. The open files look like in the ticket, lsof lists them as
“pipe”. Does anyone know of a good strategy to pin down exactly what
file/socket is open? All we get from lsof is “pipe”, which isn’t much
of a clue as to where the actual leak is.

Albert

On Mon, Jun 7, 2010 at 9:24 PM, Ryan H. [email protected] wrote:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · June 7, 2010, 9:24pm

As I dig into this, it would appear I’m experiencing the behavior listed
at the start of JRUBY-4767 (http://jira.codehaus.org/browse/JRUBY-4767)
which might be the bug that was opened for this post in the first place.

Charles Nutter wrote:

Is there a bug filed for this yet? We should try to prioritize it and
find what’s happening.

The released jruby-jars gem should have 1.5 jars in them…is that not
the case? (I’m trying to install it, but the wifi at RailsConf is a
little slow at the moment)

On Mon, Mar 29, 2010 at 2:33 PM, Homer S. [email protected]
wrote:

–
Posted via http://www.ruby-forum.com/.

To unsubscribe from this list, please visit:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email

homerlex · June 9, 2010, 6:32pm

System under high load and descriptors are not being cleaned up fast
enough (like if they’re not explicitly closed and the code leaves GC
to clean them up).

I kind of wish jruby would handle this like MRI currently does. That
is, when it can’t allocate a descriptor, it calls GC.start and tries
again (if the second attempt fails, it dies).

Possibly this should be a spec, since that’s how MRI acts…

-rp

homerlex · June 11, 2010, 2:45pm

This sounds plausible for us aswell, we see this behaviour when our
web service times out. I haven’t verified this though.

Albert

On Thu, Jun 10, 2010 at 4:11 PM, Ryan H. [email protected] wrote:

almost every time I accessed the page and forced the jruby timeout.

“pipe”. Does anyone know of a good strategy to pin down exactly what

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

homerlex · June 10, 2010, 4:11pm

In continuing to try and determine where in our application we are
building up these file descriptors, I created a test simulating one of
our web service calls. Prior to upgrading to 1.4, we had tons of trouble
with timeout and this call not timing out. With the timeout logic being
re-implemented we upgraded to 1.4 and the calls indeed started timing
out which solved a huge problem for us.

In any case, I setup a case where we make the call but I simply put a
sleep statement in the web service to ensure our call from jruby would
indeed timeout. I saw pretty clearly the “pipe” descriptors building
almost every time I accessed the page and forced the jruby timeout.

Does anyone else experiencing this issue notice if they have web service
calls timing out?

Thanks
Ryan

Albert R. wrote:

I can second this issue. We recently switched to trinidad, which uses
tomcat (like the ticket http://jira.codehaus.org/browse/JRUBY-4767),
and started seeing these issues. We did not see this behaviour in
glassfishv2 or when we switched to the gfgem.

We are on jruby 1.5 right now, and we have to restart our app servers
about once every 4-5 days or we get the too_many_open_files
exceptions. The open files look like in the ticket, lsof lists them as
“pipe”. Does anyone know of a good strategy to pin down exactly what
file/socket is open? All we get from lsof is “pipe”, which isn’t much
of a clue as to where the actual leak is.

Albert

On Mon, Jun 7, 2010 at 9:24 PM, Ryan H. [email protected] wrote:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email

homerlex · June 17, 2010, 4:22pm

I have added a tarball with instructions on how to reproduce this
problem. It is kind of involved to get it running, but it displays the
problem on my system.

http://jira.codehaus.org/browse/JRUBY-4767

and

http://jira.codehaus.org/secure/attachment/49617/leakypipes.tar

On Fri, Jun 11, 2010 at 2:43 PM, Albert R.
[email protected] wrote:

re-implemented we upgraded to 1.4 and the calls indeed started timing
Thanks

and started seeing these issues. We did not see this behaviour in

–
Posted via http://www.ruby-forum.com/.

To unsubscribe from this list, please visit:

Â Â http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email