5 minutes to save 25,000 model objects?

Hi guys. On my laptop, I’m finding that it’s taking about 5 minutes to
save 25,000 model instances.

The model has 5 validations, and I’m saving all of the model instances
in a single transaction:
http://pastie.org/509281

25,000 model instances is a lot, but 5 minutes seems too long. Any
suggestions for how to speed this up?

Thanks,
Nick

Nick H. wrote:
[…]

25,000 model instances is a lot, but 5 minutes seems too long. Any
suggestions for how to speed this up?

Thanks,
Nick

Since you’re calling k.save each time through the loop, you’re making
25,000 database queries. Don’t do that – instead, build one big insert
query with all the data (or at least do it in batches of 1000 or so).
ar-extensions might help here, or there may be another batch insert
plugin; failing that, building the SQL directly wouldn’t be too
difficult.

In general, a query within a loop is a sure sign that something is
wrong, or at least inefficient. Databases work best when you give them
all the data at once and let them sort out how to handle it.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Thu, Jun 11, 2009 at 11:24 PM, Ashwin
Mangale[email protected] wrote:

Try commenting out the uniqueness constraint and check the result. To be
fair, the validations listed are fairly standard and can be done before the
keyword object is created ( client-side using something like ajax + jquery
).

Hope this helps.

-Ashwin

Hi Ashwin. I’m finding that the uniqueness constraint doesn’t add much
additional processing time.

You’re right that the validations are quite standard. However, you
can’t rely on input data; it must be validated server-side.

Thanks,
Nick

On Jun 11, 11:29 pm, Marnen Laibow-Koser <rails-mailing-l…@andreas-
s.net> wrote:

Since you’re calling k.save each time through the loop, you’re making
Best,

Marnen Laibow-Koserhttp://www.marnen.org
[email protected]

Hi Marnen. I’m surprised I haven’t come across ar-extensions yet. It
looks like the exact solution I need to this problem. Thanks for
recommending it!
-Nick

On Jun 12, 4:54 am, Michael S. [email protected] wrote:

Hope this helps.

save
mailto:[email protected]://www.schuerig.de/michael/
Are you sure about that?:
http://pastie.org/509569

On Friday 12 June 2009, Nick H. wrote:

-Ashwin

Hi Ashwin. I’m finding that the uniqueness constraint doesn’t add
much additional processing time.

It’s a bogus constraint, however, because…

You’re right that the validations are quite standard. However, you
can’t rely on input data; it must be validated server-side.

it checks uniqueness in the wrong place. validates_uniqueness_of does
not preclude an interleaving of operations of operations from two
processes that looks like this

      A                            B

create(:keyword => ‘foo’)
validate uniqueness
create(:keyword => ‘foo’)
validate uniqueness
save
save

To be safe, you must have a constraint in the database. If you have
that, the additional app-level validation just wastes performance.

add_index :keywords, :keyword, :unique => true

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

Quoting Nick H. [email protected]:

something like ajax + jquery ).
You’re right that the validations are quite standard. However, you
validate uniqueness

Michael S.
mailto:[email protected]://www.schuerig.de/michael/

Are you sure about that?:
http://pastie.org/509569

Your example is two saves in the same thread/process. The
counter-example
REQUIRES two concurrent processes/threads. And yes his counter-example
is
correct. Unless you can guarantee that your application will NEVER run
multi-process/multi-threaded, e.g., Webrick. If your application will
ever be
run by multiple people at once, it will almost certainly require this.
Such
is the price of success.

Also pushing the uniqueness validation into the database will cut your
run
time. The uniqueness validation is implemented by doing a lookup on the
unique field before doing the insert. Letting the DB do it cuts the
number of
database calls in half. Doing the inserts in batches will also increase
the
speed significantly. Yeah, the code isn’t as pretty but if it is
already
clear you need the speed, do it. It is doubtful that it will be the
ugliest
bit of code in your app.

HTH,
Jeffrey

I had a similar situation where I had to import groups of around 2,500
objects at a time based upon the files found within a directory, back
before ar-extensions was around, and by outputing a text file and then
telling mysql to use that as an infile, the time dropped from a total
of 45 seconds on average to about 2 seconds, for the whole action. I
expected some improvment, but I had no idea how much faster it would
really be until I did it.

Carl

Try to use bulk insert for save all of your instances.
http://www.igvita.com/2007/07/11/efficient-updates-data-import-in-rails/

in a single transaction:http://pastie.org/509281

25,000 model instances is a lot, but 5 minutes seems too long. Any
suggestions for how to speed this up?

You might also look at your database to see if it can handle a
transaction that large. Depending on how it does it may need to store
quite a lot in memory…