Hi everyone, David R. here; I’m new to this mailing list but I’ve been
incredibly wrapped up in Ruby on Rails for the last two or three months.
I’ve built a fully functional web application and we’re about to roll it
out
to our customers. However, my supervisor would like to add just one
more feature
to it: listing the number of online users for a particular ISP.
Here’s the problem: the table that I need to pull data from has over
7,000,000 entries in it, and my rails app hangs when trying to process
all
this data, and the server it’s on needs to be rebooted. This
application
will service a very small number of users, so we have placed my app on a
weaker box; I know that upgrading the box to 2GB of RAM and a stronger
processor would probably fix this problem, but the process would still
take
a few minutes to run successfully even on a machine that could handle
it.
My question is this: Is there a graceful Rails way to alleviate this
problem?
Welcome to the list! I think one solution to your problem would be
to create a script that uses ActiveRecord outside of your rails app.
Then you could have this script run every 5 or 10 minutes and write
the results to disk. THen when someone hits your rails app for the
information you can just serve up the cached results.
over 7,000,000 entries in it, and my rails app hangs when trying to
process all this data, and the server it’s on needs to be
rebooted. This application will service a very small number of
users, so we have placed my app on a weaker box; I know that
upgrading the box to 2GB of RAM and a stronger processor would
probably fix this problem, but the process would still take a few
minutes to run successfully even on a machine that could handle it.
My question is this: Is there a graceful Rails way to alleviate
this problem?
Hey,
this is kind of a vague question. You might want to show a) exactly
what you’re trying to achieve and (more importantly) b) how you’re
currently doing it
That said, you’re not actually pulling in all 7 million records are you?
You say “list the number of online users” and then say “hangs trying
to process all this data”. It sounds like you’re doing:
SevenMillion.find(:all).length
when this would be a tad faster:
SevenMillion.count
Provide some more (targeted/complete) information and you’ll probably
get a speedy and accurate answer from members of the list.
Much thanks to everyone who responded. Here’s what I’m trying to do
exactly:
In our database, there is a Calls table that holds all of the calls made
for
roughly 12 ISPs that our company provides ports to. The Calls table
holds
the following data that’s important to me:
The modem box the customer dialed into, the customer’s username, the
customer’s status type (I’ll elaborate on this in a moment), and the
date/time the call was made.
Each customer gets assigned a “status type” of 1 when they sign on
successfully, a status type of 3 when they’ve been logged on for a
period of
time (not sure what this interval is) and are still active and thus they
are
not kicked offline for being idle, and a status type of 2 when they sign
off.
Unfortunately, our system doesn’t have a database that only has
currently
online users, so I can’t pull from any such luxurious place. In order
to
find out who is currently online, I have to:
Grab all of the calls from the Calls table where the status type is
equal to 2 (because these are users that have signed out) and store them
in
an array.
Becuase each user will have many entries in the Calls table where
they
have signed out (status type 2), I find those entries in the array (this
is
a gigantic array now) that are listed an even number of times and delete
them from the array. (Reason: if the user has an odd number of
signouts,
then that user is currently online.)
I’m left with a smaller array of nothing but usernames of users who
are
currently online. Now I use array.uniq to shrink the array one last
time to
just one entry per user. I can then just list these in a table in the
view.
The problem is that it has to do this seven million (or more) times.
number of times and delete them from the array. (Reason: if the
user has an odd number of signouts, then that user is currently
online.)
I’m left with a smaller array of nothing but usernames of users
who are currently online. Now I use array.uniq to shrink the array
one last time to just one entry per user. I can then just list
these in a table in the view.
The problem is that it has to do this seven million (or more) times.
Yeah that is going to be a problem. Creating that big of an array
and then doing operations on it in ruby is the bottleneck. Is there
any way you can get this information with pure sql? That would
probably be the better way to do it. ANd also for something liek this
you probably don’t want to use activerecord objects because they are
expensive to instantiate. You may want to use just the mysql-ruby
bindings(or whatever db you are using) to do the queries directly.
Regardless trying to work with arrays that large all in memory at
once is going to choke ruby. SQL is much better at manipulating large
data sets like this.
Grab all of the calls from the Calls table where the status type
who are currently online. Now I use array.uniq to shrink the array
one last time to just one entry per user. I can then just list these
in a table in the view.
The problem is that it has to do this seven million (or more) times.
I think you really want to do this with find_by_sql. All of your
conditions can be done database side which should be several orders of
magnitude faster.
Grab all of the calls from the Calls table where the status type is
equal to 2 (because these are users that have signed out) and store them in
an array.
Becuase each user will have many entries in the Calls table where they
have signed out (status type 2), I find those entries in the array (this is
a gigantic array now) that are listed an even number of times and delete
them from the array. (Reason: if the user has an odd number of signouts,
then that user is currently online.)
I don’t follow your logic here – you’ve stated above if they’ve logged
out then they have a status of 2 then they’ve logged out. Why then the
count of the odd number of statuses?
Also, why not do a query such as
select user, count(status) group by user having status = 2
Of course, it depends on your background database, but … This way
you
would still need to go through and figure who was even or odd, but
there’d
be a whole lot less data to process!!
Find_by_sql and the link Matt provided look to be exactly what I’m
looking
for. I have two days to clean up my app’s code and get it rolled out to
customers, and if in that time I have a window of opportunity to
implement
this idea in version 1.0 I’ll definitely be using these suggestions.
Thanks guys, if anything arises I’ll keep you posted. (This mailing
list is
a bag of pure awesome.)
Each customer gets assigned a “status type” of 1 when they sign on
successfully, a status type of 3 when they’ve been logged on for a period of
time (not sure what this interval is) and are still active and thus they are
not kicked offline for being idle, and a status type of 2 when they sign
off.
I’m not sure exactly how the data is stored, but here’s some example
SQL.
If there is a single row for each call that gets updated as the status
changes (the most intelligent design), it should be as simple as:
SELECT count(1) FROM calls WHERE status_type = 1 OR status_type = 3;
Assuming a properly indexed table this should be an extremely fast
query.
If there are multiple rows for each call (each status type in a
separate row, very ugly):
SELECT count(1) FROM
(SELECT user_id FROM calls
WHERE status_type = 1 OR status_type = 2
GROUP BY user_id
HAVING sum(CASE WHEN status_type = 1 THEN 1 ELSE 0 END)
= sum(CASE WHEN status_type = 2 THEN 1 ELSE 0 END) + 1
) AS foo;
This will be quite a bit slower, but still much faster than using ruby.