What's the correct way to delete old sessions

Ok, I tried searching for the answer to this but couldn’t find a
thread about it, so here goes…

I’m using sessions that I am storing in the database, is there an
accepted rails way of deleting out the old sessions?

Unless I hear a better idea, I was going to write a simple class and
call it from a cronjob to delete sessions that have been inactive
after a specified period of time.

On Thu, Sep 17, 2009 at 7:46 PM, TRBNGR [email protected]
wrote:

Ok, I tried searching for the answer to this but couldn’t find a
thread about it, so here goes…

I’m using sessions that I am storing in the database, is there an
accepted rails way of deleting out the old sessions?

Unless I hear a better idea, I was going to write a simple class and
call it from a cronjob to delete sessions that have been inactive
after a specified period of time.

class ApplicationController < ActionController::Base

before_filter :clean

def clean
ActiveRecord::Base.connection.execute( "
DELETE FROM sessions
WHERE NOW() - updated_at > 3600
" ) if rand( 1000 ) % 10 == 0
end

end


Greg D.
http://destiney.com/

That will be processed for every request, which isn’t really
necessary. It probably won’t add THAT much overhead, but if you have a
high volume site you’d want to offload the session clearing into
something else.

  • you could make a rake task that deletes sessions older than a
    certain offset, and call that from cron
  • put it into your authentication code, rather than a before_filter,
    that way it’ll only run when someone logs in
  • your idea of a simple class that runs from cron, using script/
    runner. Seems like a valid approach.
  • use a scheduling gem/plugin.

On 18 Sep 2009, at 21:01, Greg D. wrote:

That will be processed for every request,

No it won’t. It has randomization code that causes it to not run
most of the time. This is exactly how session gc should be handled.
It will ramp up proportionally with traffic.

Actually that could be never or always, relying on random numbers to
make decisions on whether to do something “most of the time” is a bad
idea.

The pointers that were given by Sax were more valid options. I’d
personally prefer the cron tab option, since you can run it on a
regular and low activity time, it’s built-in and already running on
any unix-based OS and thus requires no extra processes. It could even
be a little script that runs outside of Rails, since it’s a bit of
overkill to start a whole Rails instance just to delete some records
in the sessions table.

Best regards

Peter De Berdt

On Fri, Sep 18, 2009 at 11:01 AM, sax [email protected] wrote:

That will be processed for every request,

No it won’t. It has randomization code that causes it to not run
most of the time. This is exactly how session gc should be handled.
It will ramp up proportionally with traffic.


Greg D.
http://destiney.com/

2009/9/18 Peter De Berdt [email protected]:

decisions on whether to do something “most of the time” is a bad idea.
Since quantum physics works entirely by probabilities (that is random
numbers) and microprocessors are built from semi-conductors which
operate because of the laws of quantum physics, it could be said that
any software is entirely dependent on the operation of random numbers.
Therefore however it is coded it is ‘relying on random numbers to
make decisions on whether to do something’.

Seriously, though, to suggest that something coded using random
numbers to be executed 1% of the time may either never run or always
run is incorrect. Assuming it is correctly coded of course.

Colin

Thx for the input guys, yea Im also not sure why I would want to run
the session removal on a random basis, it seems like using script/
runner is the way Im gonna go. I can’t put it into the authorization,
fwiw because the application doesn’t have any authorization layer. Im
gonna look into the scheduling gem.

I feel like I’m missing a major point here. Assuming the table is
correctly
range partitioned and indexed, most databases should be able to handle
relatively large table sizes. I agree that is a best practice to
archive
old, unused data, but that can likely be done on a monthly basis, or
less
often, depending on traffic. Why would you need to consider a solution
that
“will ramp up proportionally with traffic”?

Jim
http://www.thepeoplesfeed.com/blog

On Sun, Sep 20, 2009 at 11:42 AM, Peter De Berdt

On 20 Sep 2009, at 17:09, Colin L. wrote:

make decisions on whether to do something’.

Seriously, though, to suggest that something coded using random
numbers to be executed 1% of the time may either never run or always
run is incorrect. Assuming it is correctly coded of course.

Well, since you are going on the philosophical tour here, there’s more
than one random variable coming into play here. Not only the mod 10
result, but also the number of hits on the application, the time at
which they hit the application etc. That’s not even playing with
probabities, that’s just plain gambling.

All I was trying to point out, is that you have no way of knowing if
and when the sessions table would be cleaned, just like you have no
way of knowing if you have a chance of winning a game of bingo or the
lotto, since you are bringing in a lot more variables than just the
semi-random computer generated ones. You could hit it the first time,
you could hit it twice in a row and you could wait days to hit it. The
fact that you have a 10% chance or a 1% chance of hitting the right
number is still a probability, not a certainty. When it comes to
cleaning a table that just keeps piling up records that become stale,
I do like to have some kind of guarantee that it will clean when I
want it, not when quantum physics and random people surfing to my
application decide it’s the right time.

Best regards

Peter De Berdt

On 20 Sep 2009, at 18:35, James E. wrote:

I feel like I’m missing a major point here. Assuming the table is
correctly range partitioned and indexed, most databases should be
able to handle relatively large table sizes. I agree that is a best
practice to archive old, unused data, but that can likely be done on
a monthly basis, or less often, depending on traffic. Why would you
need to consider a solution that “will ramp up proportionally with
traffic”?

The original poster was asking for the correct way to clean the
sessions table. Although there is no clear cut answer to that one, I
personally feel random number generation is by no means the correct
way to go. Yes, the database should be able to look up sessions very
quickly, but as you pointed out, depending on traffic, it will
eventually drain needless resources as the number of records increase,
both in terms of server cycles and storage.

Now, until cookie-based storage became available, we used the database
for session storage and used quite a few techniques over the years
we’ve been developing Rails apps. As the number of applications
increased, we started handling it differently. In rough lines, we used:

  • First couple of applications: before_filter triggered by
    authentication (or some other action that clearly had to do with
    sessions)
  • Cron tab that invokes script/runner during low traffic times (the
    problem here was that for each of the applications, a whole Rails
    instance was started and that consumed quite a bit of memory as the
    number of apps increased on the VPS we then had)
  • Cron tab that invoked the mysql command line and just went through
    all of the databases deleting sessions in one session

The last solution was really quick, used very little resources and
worked fine during the time we actually needed it. It was a little
bash script, nothing special, along the lines of:

mysql -h localhost -u[someuser-with-necessary-privileges] <
sql_commands_file

where sql_commands_file just had a series of commands to clean the
sessions:

USE databasename1
DELETE FROM sessions WHERE NOW() - updated_at > 3600
USE databasename2
DELETE FROM sessions WHERE NOW() - updated_at > 3600
USE databasename3
DELETE FROM sessions WHERE NOW() - updated_at > 3600

I think we cleaned it up a bit by just generating the whole sql
commands sequence in bash using loop script, but you get the picture.

Best regards

Peter De Berdt

yeah, my main point was that the method would be run for every
request. Probably not that many milliseconds in the grand scheme of
things, but why add any extra processing to your requests when you can
externalize it?

On Fri, Sep 18, 2009 at 4:45 PM, Peter De Berdt
[email protected] wrote:

No it won’t. It has randomization code that causes it to not run
most of the time. This is exactly how session gc should be handled.
It will ramp up proportionally with traffic.

Actually that could be never or always, relying on random numbers to make
decisions on whether to do something “most of the time” is a bad idea.

No it’s not. It’s not relying on random numbers in the sense you are
implying. The random numbers are just a way to implement a mod
percentage, as in not doing it “most of the time”.

Look at the way PHP does session garbage collection for example. You
set a callback function that only works some of the time.

http://us.php.net/manual/en/function.session-set-save-handler.php

http://us.php.net/manual/en/session.configuration.php#ini.session.gc-divisor

http://us.php.net/manual/en/session.configuration.php#ini.session.gc-probability

When using db driven sessions you don’t want to clear out all the old
sessions all of the time. You just want a rolling table setup that
clears itself based on traffic flow.


Greg D.
http://destiney.com/

On Mon, Sep 21, 2009 at 2:26 PM, sax [email protected] wrote:

yeah, my main point was that the method would be run for every
request.

Just like your before_filter for user authentication.

Probably not that many milliseconds in the grand scheme of
things, but why add any extra processing to your requests when you can
externalize it?

Putting it in cron doesn’t guarantee it will always find something to
delete. It just means you now have to maintain a cron entry external
to your actual app.


Greg D.
http://destiney.com/

Putting it in cron doesn’t guarantee it will always find something to
delete. It just means you now have to maintain a cron entry external
to your actual app.

That’s true. So just use the rufus-scheduler? :wink: