Is r1303 unstable for others, too?

Hej!

I’ve just updated Typo to r1303 today. After migrating the database to
version 58 everything seemed to work fine. Unfortunately my fcgi
instance seems to die after a few minutes. But I don’t get any entry in
any log/ file. Should I look somewhere else?

I’m running on Textdrive if that makes any difference. But I don’t
think I’m hitting the memory limit as I’m not getting any messages
from the process killing big processes (how was that called again?). At
least I don’t think I am. It should write something into a file in my
home dir, shouldn’t it?

Also I’m using the new tmcode macro. Maybe that may be part of my
problem?

Urban

“Urban H.” [email protected] writes:

home dir, shouldn’t it?
I don’t know about the workings of TextDrive; I do know they have
pretty tight limits on memory use, but I don’t know where you’d find
out about what processes got reaped.

Certainly dying without informing the logfile of anything smacks of
being 'kill -9’d by a resource limiter.

Do you get anything at all added to your logfile?

Also I’m using the new tmcode macro. Maybe that may be part of my
problem?

Possibly. I’ve not looked at its workings to be honest.

If it was killed by TxD’s samurai process, then you will see a
process_watchdog.log in your home directory which lists why/when/what
was
killed.

-Linda

“Urban H.” [email protected] writes:

least I don’t think I am. It should write something into a file in my
home dir, shouldn’t it?

Also I’m using the new tmcode macro. Maybe that may be part of my
problem?

What happens if you roll back to r1299?

That takes it back to rails 1.1.6. If it’s stable, I’d appreciate it
if you could then step forward to r1300 check that for stability.
If it’s stable, move forward one step at a time checking each version
for stability, which should help us nail down which specific changes
are responsible for what I’m assuming is a memory leak issue.

“Linda D.” [email protected] writes:

If it was killed by TxD’s samurai process, then you will see a
process_watchdog.log in your home directory which lists why/when/what was
killed.

Fair enough. I wonder what is causing it then. I can’t replicate the
issue here at the moment, but I’m giving whiteboards a hard look at
the moment because they don’t quite work right at the moment.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 24, 2006, at 20:27 , Linda D. wrote:

If it was killed by TxD’s samurai process, then you will see a
process_watchdog.log in your home directory which lists why/when/
what was
killed.

Well, there’s no such file. So that’s not what is happening it seems.

Thanks for the tip.

Urban

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFaVFCggNuVCIrEyURApx/AJwJK02aPzhOtd4chOEKqXYGlcflTgCdFth7
I+mhqdpLhOVkprSATCQv+Zc=
=zRAs
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 24, 2006, at 20:24 , Piers C. wrote:

What happens if you roll back to r1299?

That takes it back to rails 1.1.6. If it’s stable, I’d appreciate it
if you could then step forward to r1300 check that for stability.
If it’s stable, move forward one step at a time checking each version
for stability, which should help us nail down which specific changes
are responsible for what I’m assuming is a memory leak issue.

I’ll try tomorrow. Is there anything special to be aware of? Like
having to migrating the database back?

Urban


http://bettong.net - Urban’s Blog

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFaVJZggNuVCIrEyURAtm3AJ9cpFB8eQEP9fC+n1YuQ0DakrbBxgCdFadD
YBBKs7IpaY44ozqxKqcSr6I=
=4Qs2
-----END PGP SIGNATURE-----

On 11/26/2006, “Piers C.” [email protected] wrote:

are responsible for what I’m assuming is a memory leak issue.

I’ll try tomorrow. Is there anything special to be aware of? Like
having to migrating the database back?

I did this and r1299 seems to be stable. At least at the test I used to
trigger this memory leak:

  1. Go to homepage
  2. Go to main admin page
  3. Click empty fragment cache
  4. Reload homepage
  5. Boom!

When I do this at r1300 I get the following on my console:

[FATAL] failed to allocate memory

Neither the fastcgi.crash.log nor the production.log contain any error
messages. Should I try and switch to development. Does this give more
output?

Urban

“Urban H.” [email protected] writes:

If it’s stable, move forward one step at a time checking each version

  1. Go to main admin page
    output?
    Development probably doesn’t give more output, and probably breaks, if
    anything earlier. But it might be worth trying.

One other thing to try is to change blog.rb, text_filter.rb and
user.rb back to using ActiveRecord::Base rather than CachedModel as
their super classes.

If you do remove the CachedModel inheritance stuff, you can also
modify the beginning of app/controllers/application.rb so the opening
stanza looks like:

class ApplicationController < ActionController::Base
include LoginSystem

before_filter :reset_local_cache, :fire_triggers

before_filter :fire_triggers

after_filter :reset_local_cache

I’m afraid this sort of debugging is unlikely to be quick – I run
with a different hosting provider, or I could probably get at what’s
up a wee bit quicker. Thanks for your help with this.

Urban H. [email protected] writes:

I’ll try tomorrow. Is there anything special to be aware of? Like
having to migrating the database back?

Good point. rake db:migrate VERSION=55 before you do the roll back.

Kevin B. [email protected] writes:

Incidentally, Piers, if tmcode is causing a problem (such as increased
memory usage) I would say that points to an issue with the whiteboard
implementation. Since you said you’re giving whiteboards a hard look,
hopefully if that’s the case you’ll find it.

On the other hand, tmcode could just be a red herring.

Seems to be; 1299 works and 1300 doesn’t.

On 11/27/2006, “Piers C.” [email protected] wrote:

class ApplicationController < ActionController::Base
include LoginSystem

before_filter :reset_local_cache, :fire_triggers

before_filter :fire_triggers

after_filter :reset_local_cache

I’ve updated to r1324, changed the files you suggested but I still get
the same result. Attached are the log files.

Urban

“Urban H.” [email protected] writes:

modify the beginning of app/controllers/application.rb so the opening
stanza looks like:

class ApplicationController < ActionController::Base
include LoginSystem

before_filter :reset_local_cache, :fire_triggers

before_filter :fire_triggers

after_filter :reset_local_cache

I’ve updated to r1324, changed the files you suggested but I still get
the same result. Attached are the log files.

Ah… try running in production mode; development mode leaks memory.

Try going back to a vanilla r1324 and uncommenting the
RAILS_ENV=production line in config/environment.rb


Typo-list mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/typo-list

“Urban H.” [email protected] writes:

On 12/9/2006, “Piers C.” [email protected] wrote:

Ah… try running in production mode; development mode leaks memory.

Try going back to a vanilla r1324 and uncommenting the
RAILS_ENV=production line in config/environment.rb

I actually run in production mode normally. I tried it again with r1324
(and the scribbish theme instead of one not supplied in the Typo tree)
and I get the same error.

So, what do we know?

  1. Migrations went smoothly and you’re on schema version 61
  2. Textdrive isn’t killing stuff because it’s not putting any reports
    in your home directory
  3. Typo’s dying before it logs anything.

Does it always die in the same place?

On 12/9/2006, “Piers C.” [email protected] wrote:

Ah… try running in production mode; development mode leaks memory.

Try going back to a vanilla r1324 and uncommenting the
RAILS_ENV=production line in config/environment.rb

I actually run in production mode normally. I tried it again with r1324
(and the scribbish theme instead of one not supplied in the Typo tree)
and I get the same error.

Urban

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 12, 2006, at 0:32 , Piers C. wrote:

So, what do we know?

  1. Migrations went smoothly and you’re on schema version 61
  2. Textdrive isn’t killing stuff because it’s not putting any reports
    in your home directory
  3. Typo’s dying before it logs anything.
  1. Sometimes I get the message “[FATAL] failed to allocate memory” on
    the console that I started the FCGI on. BTW, maybe it’s of interest: I’m
    running Typo as an FCGI process using lighttpd. I mean, I start the FCGI
    using “spawn-fcgi” and tell lighttpd where to find the socket.

Does it always die in the same place?

Not really. It seems that the problem occurs whenever Typo tries to load
a new page that isn’t in the cache, yet. Sometimes it works for a page
more, but once I try to load another page that isn’t in the cache (or so
I’m guessing) it blows up.

Urban


http://bettong.net - Urban’s Blog

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFFi/IQggNuVCIrEyURApMsAJ9ip4zLUfz2LptZZlmbGYeeaauyEQCglaqK
xaa+l2lgAXVCPjZliRD8Mu8=
=QkwX
-----END PGP SIGNATURE-----

Urban H. [email protected] writes:

the console that I started the FCGI on. BTW, maybe it’s of interest: I’m
running Typo as an FCGI process using lighttpd. I mean, I start the FCGI
using “spawn-fcgi” and tell lighttpd where to find the socket.

Oh crap. Memory leak. I hate memory leaks.

Does it always die in the same place?

Not really. It seems that the problem occurs whenever Typo tries to load
a new page that isn’t in the cache, yet. Sometimes it works for a page
more, but once I try to load another page that isn’t in the cache (or so
I’m guessing) it blows up.

Bugger. Definitely a memory leak.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 23, 2006, at 12:15 , Piers C. wrote:

  1. Typo’s dying before it logs anything.
    Does it always die in the same place?

Not really. It seems that the problem occurs whenever Typo tries
to load
a new page that isn’t in the cache, yet. Sometimes it works for a
page
more, but once I try to load another page that isn’t in the cache
(or so
I’m guessing) it blows up.

Bugger. Definitely a memory leak.

Seems like. BTW, thanks for all the help tracking down this bug!

Urban


http://bettong.net - Urban’s Blog

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFFjTCzggNuVCIrEyURAgBcAJ9SrpiNF9luF7EE2BzTnQ6U4H7f5QCgo4b0
ah6L/FJwFKZQkZr/nz6u1/I=
=4tXB
-----END PGP SIGNATURE-----

Urban H. [email protected] writes:

On Dec 23, 2006, at 12:15 , Piers C. wrote:

Bugger. Definitely a memory leak.

Seems like. BTW, thanks for all the help tracking down this bug!

If only it were tracked down. Now we’ve got to find where we’re
leaking from, and that’s never fun.