was wondering if anyone else had a similar problem and knows why or a
solution.
basically my mongrels seems to work fine. i am running three clusters
all which are monitored by monit. monit has the ability to restart a
mongrel if it doesn’t pass a port connection test. so the problem is
that after some time. aprox. 6hrs. to 20hrs. after clusters are
started, the mongrels get restarted by monit due to monit not being
able to connect to said port. not all of them at the same time. just
some of them, sometimes. the server is pre-production and is getting no
hits. could this be the problem and when the server is live, with
constant use the mongrels will remain working. or could this be a monit
issue ?
any help would be truly appreciated…
Chris
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
What do your logs say?
Why are the mongrels not responding?
Since you’re not in production, you should be able to pinpoint exactly
when and why they stopped responding.
I hit this too at the same kind of timeframe you mentioned. In my
case, the mongrel processes do become non-responsive, making monit
necessary to keep my webapp living + breathing. The problem occurs
on multiple machines: some running OpenSuse 64-bit and some running
Ubuntu Feisty Fawn 64-bit.
Some folks had suggested this is related to not using the mysql gem
for database access. This may be the case, but the mysql gem wasn’t
a possibility for me since it is very buggy in 64-bit (it crashed my
webapps). There is a also an ActiveRecord timeout that is usually
prescribed, but this had no effect for me.
I wonder… how many 64-bit mongrel users are out there?
I hit this too at the same kind of timeframe you mentioned. In my
That sort of a delay – 6 to 20 hours is what the OP mentioned –
screams at me that the problem is probably related to the db handle
timing out. Even if you change the AR timeout value to 14400 (the
most often quoted value that I see), that is still just 4 hours. If
your process sits quiescent for 6 to 20 hours while the timeout on the
db handle is set at 4 hours, the db handle is still going to time out.
I wonder… how many 64-bit mongrel users are out there?
My old servers are 32 bit machines, but my new ones are all 64 bit
machines.
That sort of a delay – 6 to 20 hours is what the OP mentioned –
screams at me that the problem is probably related to the db handle
timing out. Even if you change the AR timeout value to 14400 (the
most often quoted value that I see), that is still just 4 hours. If
your process sits quiescent for 6 to 20 hours while the timeout on the
db handle is set at 4 hours, the db handle is still going to time out.
Thanks for this Kirk. Yep, I was using 14400. I’m switching this to
2 weeks: 1209600 and we’ll see if any further restarts are needed by
monit.
My old servers are 32 bit machines, but my new ones are all 64 bit
machines
thanks for the replies. i will try setting that db timeout to about a
week and see how it does.
Chris
Pete DeLaurentis [email protected] wrote: > That sort of a delay –
6 to 20 hours is what the OP mentioned –
screams at me that the problem is probably related to the db handle
timing out. Even if you change the AR timeout value to 14400 (the
most often quoted value that I see), that is still just 4 hours. If
your process sits quiescent for 6 to 20 hours while the timeout on the
db handle is set at 4 hours, the db handle is still going to time out.
Thanks for this Kirk. Yep, I was using 14400. I’m switching this to
2 weeks: 1209600 and we’ll see if any further restarts are needed by
monit.
My old servers are 32 bit machines, but my new ones are all 64 bit
machines
Thanks for this Kirk. Yep, I was using 14400. I’m switching this to
2 weeks: 1209600 and we’ll see if any further restarts are needed by
monit.
I’ve always wondered why 14400 is the number that is always passed
around when talking about extending the timeout period. Maybe there
is some db issue with a really long timeout like 1209600?
Which 64-bit OS are you running?
Right now I have Ubuntu and CentOS 64 bit machines.
“Maybe there is some db issue with a really long timeout like
1209600?”
that was my thought. i set it mine to 115200, 32 hours
more than enough but not too crazy
Kirk H. [email protected] wrote: On 11/7/07, Pete DeLaurentis
wrote:
Thanks for this Kirk. Yep, I was using 14400. I’m switching this to
2 weeks: 1209600 and we’ll see if any further restarts are needed by
monit.
I’ve always wondered why 14400 is the number that is always passed
around when talking about extending the timeout period. Maybe there
is some db issue with a really long timeout like 1209600?
Which 64-bit OS are you running?
Right now I have Ubuntu and CentOS 64 bit machines.