In May I had problem with mongrels suddenly consuming huge cpu resources
for
a minute or two and then returning to normal (load average spikes up
to 3.8and then back down to a regular
0.2 over the course of 5 minutes, then again 1/2 hour later. or 4 hours
later, no predictable rhythm).
I posted to Litespeed forums because I thought the problem was there but
didn’t get far. And a week later migrated hosting companies and the
problem
was gone. Now its returned. We make a lot of changes, but I’ve gone over
the
repo for the last few weeks and can’t see anything structural that
should
effect it. It only happens with our main front end app (the other two
are
fine), but happens at all times of day(/night) so doesn’t seem triggered
by
a heavy load. Basically a mongrel gets stuck on one or two cached files
for
a few minutes (but still functions fine for other requests, I can ping
specific rails pages on all mongrels during this period).
strace -e read,write,close produces this repeatedly the whole time
(short
excerpt of 1000s of lines):
close(5) = -1 EBADF (Bad file
descriptor)
read(5, “GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image”…,
16384)
= 473
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_rowde_wiltshire_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/p”…,
16384) = 471
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_cove_south_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, imag”…,
16384) =
474
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image”…,
16384)
= 473
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_rowde_wiltshire_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/p”…,
16384) = 471
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_cove_south_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, imag”…,
16384) =
474
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image”…,
16384)
= 473
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_rowde_wiltshire_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/p”…,
16384) = 471
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_cove_south_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, imag”…,
16384) =
474
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image”…,
16384)
= 473
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_rowde_wiltshire_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/p”…,
16384) = 471
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_cove_south_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, imag”…,
16384) =
474
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
read(5, “GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image”…,
16384)
= 473
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
the file its trying to get is page cached, and exists/is fine (can even
go
to url while this is going on).
Could it still be a problem with Litespeed (actually requesting this
file
many times?). Litespeeds cpu usuage does go up during this period, but
stracing it doesn’t give anything useful.
thanks for any tips/directions.
Zach