Mongrels stop responding

I have a Rails 1.2.3 app (due for an update, I know) running on a
Mongrel 1.1.4 behind Apache 2.2.3 mod_proxy on Debian 4.0. It keeps
hanging up for no reason I can tell. I have tried everything that anyone
has ever hinted at being a solution for this problem:

  • The application doesn’t use MySQL, so setting the connection timeout
    won’t help.
  • Using AR not PStore for sessions.
  • Logger isn’t rotating the logs itself.
  • I added the no-keepalive and force 1.0 setenv commands to apache.

In the apache log, I get the typical “proxy: error reading status line
from remote server localhost”. lsof shows lots of sockets in CLOSE_WAIT.
strace gives this sequence over and over:

time(NULL) = 1207001155
select(4, [3], [], [], {0, 0}) = 0 (Timeout)
sigprocmask(SIG_BLOCK, NULL, []) = 0
time(NULL) = 1207001155
gettimeofday({1207001155, 980832}, NULL) = 0
gettimeofday({1207001155, 980855}, NULL) = 0
select(4, [3], [], [], {0, 999977}) = 0 (Timeout)
gettimeofday({1207001156, 980679}, NULL) = 0
select(4, [3], [], [], {0, 153}) = 0 (Timeout)

I tried letting them run after sending a USR1, but that doesn’t seem to
generate any extra information.

At first, I thought it would only happen if the Mongrels were allowed to
sit idle - similar to the MySQL problem others reported - but then I saw
one go silent within a half hour of a restart.

Any suggestions?

Gdb the stuck mongrel and force it to raise a backtrace. It looks like
maybe you’re stuck in a timeout loop somewhere that never excepts
properly?

Evan

Oh, I meant a Ruby backtrace:
http://eigenclass.org/hiki.rb?ruby+live+process+introspection

Evan

Evan W. wrote:

Oh, I meant a Ruby backtrace:

I believe I’ve blown this opportunity. I got my gdb environment set up
as described in the article, attached to the process, and did
“rb_backtrace”. All I got out of it was “[Switching to Thread
-1211282784 (LWP 31635)] $1 = 32”.

Then I noticed the article says something about redirecting stdout. I
issued that command and then did the backtrace again, but the only thing
I got that time was “$3 = 32”.

In an effort to make sure the ruby debugging commands were working
properly, I tried one of the “easier” ones:

(gdb) rb_object_counts

Program received signal SIGSEGV, Segmentation fault.
0xb7eff6fe in st_lookup () from /usr/lib/libruby1.8.so.1.8

When I exited gdb, the hung mongrel had died.

So it seems that any meaningful output was lost because I failed to
redirect stdout first…? I checked the log files (where I’d expect
stdout to go if it weren’t redirected), and there is no additional
information there either.

Please correct me if I’m wrong, but the next chance I get, it seems the
proper sequence of commands goes:

gdb ruby PID_of_bad_mongrel
(gdb) session-ruby
(gdb) redirect_stdout
(gdb) rb_backtrace

Thanks.

A second app on the same box has gone belly-up. This app does use mysql,
but has all of the appropriate fixes put in place as with the first app.

This time I was able to get the backtrace out of the /tmp file:

["/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/…/lib/mongrel/configurator.rb:285:in
run'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/configurator.rb:285:inloop’",
“/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/…/lib/mongrel/configurator.rb:285:in
run'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/configurator.rb:285:ininitialize’”,
“/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/…/lib/mongrel/configurator.rb:285:in
new'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/configurator.rb:285:inrun’”,
“/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/mongrel_rails:128:in
run'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/command.rb:212:inrun’”,
“/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/mongrel_rails:281”]

This app is the beta test copy of our “intranet portal” application. It
never has problems on the production server.

For reference, the other application is referred to internally as
“paperless-mpcs”.

Stuck in the sleeper thread… that’s odd:

285: $mongrel_sleeper_thread = Thread.new { loop { sleep 1 } }

Kirk, any ideas?

Evan

You said this app was not using MySQL. Is it using Oracle? if so,
ok. Otherwise, why did your trace wander through the Oracle client
libraries?

(I spent a lot of time one day trying to trace some Oracle library
oddities. The very un-Linux like solution of rebotting the VM fixed
things but I am now quite leary of unexplained Oracle detours. There
is also a very odd interaction with LDAP and the Oracle LDAP libraries

Quoting M. Kinney [email protected]:

Evan W. wrote:

Gdb the stuck mongrel and force it to raise a backtrace.

I had to wait for one of them to hang again. This time, no CLOSE_WAIT.
Also note, I’ve never used gdb before. I googled up how to attach to a
PID and hit the “backtrace” command. If there’s anything else I need to
do, the GDB session is still available to me for a limited time (until
enough people complain about the app being down).

GNU gdb 6.4.90-debian
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for
details.
This GDB was configured as “i486-linux-gnu”.
Attaching to process 31635
Reading symbols from /usr/bin/ruby1.8…(no debugging symbols
found)…done.
Using host libthread_db library “/lib/tls/i686/cmov/libthread_db.so.1”.
Reading symbols from /usr/lib/libruby1.8.so.1.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libruby1.8.so.1.8
Reading symbols from /lib/tls/i686/cmov/libpthread.so.0…(no debugging
symbols found)…done.
[Thread debugging using libthread_db enabled]
[New Thread -1211282784 (LWP 31635)]
[New Thread -1262781520 (LWP 31645)]
Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0
Reading symbols from /lib/tls/i686/cmov/libdl.so.2…(no debugging
symbols found)…done.
Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1…
(no debugging symbols found)…done.
Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1
Reading symbols from /lib/tls/i686/cmov/libm.so.6…(no debugging
symbols found)…done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/tls/i686/cmov/libc.so.6…(no debugging
symbols found)…done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/ruby/1.8/i486-linux/etc.so…
(no debugging symbols found)…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/etc.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/stringio.so…(no
debugging symbols found)…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/stringio.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/syck.so…(no
debugging symbols found)…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/syck.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/socket.so…(no
debugging symbols found)…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/socket.so
Reading symbols from
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/lib/http11.so…done.
Loaded symbols for
/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/…/lib/http11.so
Reading symbols from
/usr/lib/ruby/gems/1.8/gems/fastthread-1.0.1/lib/fastthread.so…done.
Loaded symbols for
/usr/lib/ruby/gems/1.8/gems/fastthread-1.0.1/lib/fastthread.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/zlib.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/zlib.so
Reading symbols from /usr/lib/libz.so.1…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/ruby/1.8/i486-linux/strscan.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/strscan.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/bigdecimal.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/bigdecimal.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/nkf.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/nkf.so
Reading symbols from
/usr/local/lib/site_ruby/1.8/i486-linux/oci8lib.so…done.
Loaded symbols for /usr/local/lib/site_ruby/1.8/i486-linux/oci8lib.so
Reading symbols from
/usr/lib/oracle/11.1.0.1/client/lib/libclntsh.so.11.1…done.
Loaded symbols for /usr/lib/oracle/11.1.0.1/client/lib/libclntsh.so.11.1
Reading symbols from
/usr/lib/oracle/11.1.0.1/client/lib/libnnz11.so…done.
Loaded symbols for /usr/lib/oracle/11.1.0.1/client/lib/libnnz11.so
Reading symbols from /lib/tls/i686/cmov/libnsl.so.1…done.
Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1
Reading symbols from /usr/lib/libaio.so.1…done.
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from
/usr/lib/oracle/11.1.0.1/client/lib/libociicus.so…done.
Loaded symbols for /usr/lib/oracle/11.1.0.1/client/lib/libociicus.so
Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2…done.
Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2
Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2…done.
Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2
Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2…done.
Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
Reading symbols from /usr/lib/ruby/1.8/i486-linux/fcntl.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/fcntl.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/digest/md5.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/digest/md5.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/digest.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/digest.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/racc/cparse.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/racc/cparse.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/iconv.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/iconv.so
Reading symbols from /usr/lib/ruby/1.8/i486-linux/openssl.so…done.
Loaded symbols for /usr/lib/ruby/1.8/i486-linux/openssl.so
Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8…done.
Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8
Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8…done.
Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8
Reading symbols from /lib/tls/i686/cmov/libnss_dns.so.2…done.
Loaded symbols for /lib/tls/i686/cmov/libnss_dns.so.2
Reading symbols from /lib/tls/i686/cmov/libresolv.so.2…done.
Loaded symbols for /lib/tls/i686/cmov/libresolv.so.2
Failed to read a valid object file image from memory.
0xb7f45410 in ?? ()
(gdb) backtrace
#0 0xb7f45410 in ?? ()
#1 0xbf9f0fa8 in ?? ()
#2 0xbf9f0e90 in ?? ()
#3 0xbf9f0f10 in ?? ()
#4 0xb7d99131 in select () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7e99f91 in rb_thread_schedule () from /usr/lib/libruby1.8.so.1.8
#6 0xb7e9ac80 in rb_thread_wait_for () from /usr/lib/libruby1.8.so.1.8
#7 0xb7ee750b in Init_process () from /usr/lib/libruby1.8.so.1.8
#8 0xb7e94653 in rb_provide () from /usr/lib/libruby1.8.so.1.8
#9 0xb7e9bdc2 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#10 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#11 0xb7ea4997 in rb_apply () from /usr/lib/libruby1.8.so.1.8
#12 0xb7ea75be in rb_apply () from /usr/lib/libruby1.8.so.1.8
#13 0xb7ea88bb in rb_apply () from /usr/lib/libruby1.8.so.1.8
#14 0xb7e9463e in rb_provide () from /usr/lib/libruby1.8.so.1.8
#15 0xb7e9bdc2 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#16 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#17 0xb7ea4997 in rb_apply () from /usr/lib/libruby1.8.so.1.8
#18 0xb7ea5d0c in rb_apply () from /usr/lib/libruby1.8.so.1.8
#19 0xb7ea75be in rb_apply () from /usr/lib/libruby1.8.so.1.8
#20 0xb7e9faf4 in ruby_stop () from /usr/lib/libruby1.8.so.1.8
#21 0xb7e94247 in rb_provide () from /usr/lib/libruby1.8.so.1.8
#22 0xb7e9bdc2 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#23 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#24 0xb7e9ce47 in rb_obj_call_init () from /usr/lib/libruby1.8.so.1.8
#25 0xb7e9cea2 in rb_obj_call_init () from /usr/lib/libruby1.8.so.1.8
#26 0xb7e94653 in rb_provide () from /usr/lib/libruby1.8.so.1.8
#27 0xb7e9bdc2 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#28 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#29 0xb7ea486c in rb_apply () from /usr/lib/libruby1.8.so.1.8
#30 0xb7ea5d0c in rb_apply () from /usr/lib/libruby1.8.so.1.8
#31 0xb7ea2e39 in rb_apply () from /usr/lib/libruby1.8.so.1.8
#32 0xb7e9c6e1 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#33 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#34 0xb7ea486c in rb_apply () from /usr/lib/libruby1.8.so.1.8
#35 0xb7e9c6e1 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#36 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#37 0xb7ea486c in rb_apply () from /usr/lib/libruby1.8.so.1.8
#38 0xb7e9c6e1 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#39 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#40 0xb7ea486c in rb_apply () from /usr/lib/libruby1.8.so.1.8
#41 0xb7ea4d31 in rb_apply () from /usr/lib/libruby1.8.so.1.8
#42 0xb7ea9baa in rb_load () from /usr/lib/libruby1.8.so.1.8
#43 0xb7eaa487 in rb_f_require () from /usr/lib/libruby1.8.so.1.8
#44 0xb7e94653 in rb_provide () from /usr/lib/libruby1.8.so.1.8
#45 0xb7e9bdc2 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#46 0xb7e9cb38 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#47 0xb7ea4997 in rb_apply () from /usr/lib/libruby1.8.so.1.8
—Type to continue, or q to quit—
#48 0xb7eaa618 in rb_load_protect () from /usr/lib/libruby1.8.so.1.8
#49 0xb7eaa662 in ruby_exec () from /usr/lib/libruby1.8.so.1.8
#50 0xb7eaa69f in ruby_run () from /usr/lib/libruby1.8.so.1.8
#51 0x08048612 in main ()
(gdb)

Cynthia K. wrote:

You said this app was not using MySQL. Is it using Oracle? if so,
ok. Otherwise, why did your trace wander through the Oracle client
libraries?

The paperless-mpcs app does use Oracle. This app is where the gdb
backtrace came from. I am waiting for this app to fail again so that I
can get a ruby backtrace.

The intranet portal app uses MySQL. This is the app that generated the
ruby backtrace that shows mongrel “stuck in the sleeper thread”.

(I spent a lot of time one day trying to trace some Oracle library
oddities. The very un-Linux like solution of rebotting the VM fixed
things but I am now quite leary of unexplained Oracle detours.

Are you suggesting I reboot my server? I can make that happen. I would
hate to call that a “fix” though. As you say, very un-Linux-like.

There
is also a very odd interaction with LDAP and the Oracle LDAP libraries

I don’t use Ruby/LDAP on this server yet, but I will in the future.
Thanks for pointing that out, you’ve almost certainly saved me some
serious headaches!

Just as I was preparing to leave for the day, another Mongrel stopped
working. This time, the paperless-mpcs app. Now, I may still not be
doing this gdb thing right, because rb_backtrace didn’t generate any
output. But then I tried ‘eval “caller”’ and this is what I got:

["/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/…/lib/mongrel/configurator.rb:285:in
run'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/configurator.rb:285:inloop’", “/usr/lib/ruby/gems/1.
8/gems/mongrel-1.1.4/bin/…/lib/mongrel/configurator.rb:285:in run'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/configurator.rb:285:ininitialize’”, “/usr/lib/ruby/gems/1.8/gems/mongrel-1.
1.4/bin/…/lib/mongrel/configurator.rb:285:in new'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/configurator.rb:285:inrun’”, “/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/mongrel_rails:12
8:in run'", "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/command.rb:212:inrun’”,
“/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/mongrel_rails:281”,
“/usr/bin/mongrel_rails:16:in `load’”, “/u
sr/bin/mongrel_rails:16”]

Seems to be the same thing that’s hanging up the portal app?

Thanks in advance…

I hadn’t heard about threading issues. What if I’m still on 1.8.5 (I’m
not the one who asked the question, but I do use mongrels and
threading)–should I be concerned?

Jonathan

Ezra Z. wrote:

"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/…/lib/mongrel/
12
Thanks in advance…\


Mongrel-users mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/mongrel-users


Jonathan R.
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu

On Tue, Apr 1, 2008 at 3:56 PM, Jonathan R. [email protected]
wrote:

I hadn’t heard about threading issues. What if I’m still on 1.8.5 (I’m
not the one who asked the question, but I do use mongrels and
threading)–should I be concerned?

There are two issues.

When 1.8.5 was out, it was discovered that Array#shift had a stupid
bug in it that effectively leaked memory. Mutex uses an Array to keep
a queue of threads that are waiting for a mutex, and it uses
Array#push and Array#shift to put things into the queue and take them
out of it…

There was a crude workaround, which was to monkey patch Mutex to use
Array#unshift and Array#pop, instead, which didn’t leak.

However, an intrepid Rubyist took this opportunity to reimplement the
Ruby threading primitives as a C extension instead of pure Ruby.
Thus, the fastthread library was born.

Fastthread was accepted into Ruby 1.8.6 to replace the pure ruby
primitives, but the first several releases/patch levels had bugs.

Mongrel will use fastthread, so Ruby 1.8.5 w/ fastthread is a
perfectly fine platform.

If you are on Ruby 1.8.6, though, you need to be on p110 or higher, as
that patch level has both properly working threading primitives, and
has Array#shift fixed so it doesn’t leak.

Kirk H.

On Apr 1, 2008, at 2:45 PM, Michael K. wrote:

loop'", "/usr/lib/ruby/gems/1. "/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/../lib/mongrel/ command.rb:212:in run’",
“/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.4/bin/mongrel_rails:281”,
“/usr/bin/mongrel_rails:16:in `load’”, “/u
sr/bin/mongrel_rails:16”]

Seems to be the same thing that’s hanging up the portal app?

Thanks in advance…\

Michael-

What version and patch level of ruby are you running? If you are
using 1.8.6 you need at least patch level 110, any patch levels
before that have threading issues and should not be used.

Cheers-

Ezra Z. wrote:

What version and patch level of ruby are you running? If you are
using 1.8.6 you need at least patch level 110, any patch levels
before that have threading issues and should not be used.

ruby --version reports 1.8.5 (2006-08-25)

That said, this is a Debian 4.0 system so they may have applied
backported security patches. I don’t know how to get it to report the
patch level.

I do also have fastthread 1.0.1 and cgi_multipart_eof_fix 2.5.0
installed.

Quoting M. Kinney [email protected]:

(I spent a lot of time one day trying to trace some Oracle library
oddities. The very un-Linux like solution of rebotting the VM fixed
things but I am now quite leary of unexplained Oracle detours.

Are you suggesting I reboot my server? I can make that happen. I would
hate to call that a “fix” though. As you say, very un-Linux-like.

No I am not suggesting a server reboot if you can restart your mongrel
otherwise. I just mentioned it because we didn’t even think about
rebooting the machine just because we couldn’t restart our
mongrels. It was a couple of hours before we tried that irrational -
but immediately successful - strategy.


Cynthia K.

I think I may have it:

The “portal” application has been well behaved since I installed the
mysql gem v2.7. It had been using the AR internal adapter.

The “paperless” application needed an earlier version of the
composite_primary_keys gem. Since downgrading, the app has not hung up.

Thanks.

I looked up the Debian package info for the ruby1.8 deb and it looks
like they’ve applied a series of numbered patches up to 164.