Remote index blocks?

iamdirkcalloway · September 27, 2007, 9:13pm

Using the Drb allows me to synchronize writes to the index in a multi
mongrel environment. I was under the impression that the remote index
would not block if two mongrels were searching the index. Is that the
case? This line in ferret_server.rb makes me think otherwise:

   # Calls are not queued atm, so this will block until the call

returned.
#
def method_missing(name, *args)
I see how the above would allow for synchronizing writes, but I don’t
see how it would allow for concurrent reads.

I’m seeing some issues with production performance of queries, and
I’d like to figure out if concurrent queries against the remote
server will block or if they are run in different threads. I’m
running on linux with the trunk version of AAF and Ruby 1.8.4.

Erik

iamdirkcalloway · September 27, 2007, 10:28pm

On Thu, Sep 27, 2007 at 03:12:51PM -0400, Erik M. wrote:

Using the Drb allows me to synchronize writes to the index in a multi
mongrel environment. I was under the impression that the remote index
would not block if two mongrels were searching the index. Is that the
case? This line in ferret_server.rb makes me think otherwise:
   # Calls are not queued atm, so this will block until the call  
returned.

Don’t worry, it’s only bad wording

What this means is only that indexing is not done in an asynchronous
way. So your call to Model#save which triggers an index update won’t
return until the server has finished adding that record to the index.

Other processes will get their own threads on the DRb side,
synchronization is done in Ferret’s Index class which allows concurrent
searches.

cheers,
Jens

–
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

iamdirkcalloway · September 28, 2007, 9:43pm

Thanks Jens.

I’m seeing very strange behavior. The ferret/drb server is running on
the web server. I disable AAF in my environment.rb like so:

MyModel.disable_ferret

Then, every hour I run a script that grabs the records that have been
updated and I call

MyModel.bulk_index array_of_changed_objects

When there are, say, 100 or so objects that have changed the server
where DRB is will have approximately 50% of the CPU waiting on IO,
and will drop from 120MB free of memory to 10MB free, though the ruby/
drb process doesn’t seem to actually consume that memory – or at
least it’s not reported to by top. While the batch update is
happening it seems like my entire site is locked up. Requests usually
hang until the indexing completes. Further, I can’t even run script/
console from a different machine until the indexing completes. I’m
doing the bulk index on two indexes. One is about 41K records, the
other is 1 million records. In both cases there has been at most 100
or so objects that needed to be indexed in bulk. The fact that script/
console, when run from a different server, doesn’t load until the
index stops makes me think that either something is blocking in the
ferret/drb server, or the optimization of the 3GB index after the
bulk_index of 100 records is consuming all of the web server’s
resources.

Any idea what is going on or how I can debug this issue?

Thanks in advance.

iamdirkcalloway · September 29, 2007, 6:13pm

On Fri, Sep 28, 2007 at 03:42:30PM -0400, Erik M. wrote:

MyModel.bulk_index array_of_changed_objects

When there are, say, 100 or so objects that have changed the server
where DRB is will have approximately 50% of the CPU waiting on IO,
and will drop from 120MB free of memory to 10MB free, though the ruby/
drb process doesn’t seem to actually consume that memory – or at
least it’s not reported to by top.

Where do you get these numbers from? possibly it’s just the os using
unused ram for filesystem buffers?

consuming all of the web server’s resources.
While the batch update is running, nobody else is able to update the
index. So yes, every other request that wants to update the index will
hang. However requests not using aaf at all, or searches, should do
fine.

If you feel like the DRb takes too much CPU, use renice or nice when you
start it to lower it’s priority.

Another possibility that comes to mind is database locks, however I
can’t imagine where these should come from.

With your index size, the optimizing might be the culprit - just comment
out that portion and look how it goes without it (in
ferret_extensions.rb).

cheers,
Jens

–
Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

iamdirkcalloway · September 29, 2007, 6:28pm

We are going to hack AAF a bit to make the optimize optional. Should
we flush every time we bulk index if we don’t optimize?

I’m running this application on EC2, so I think part of the problem
is the poor IO performance on the VPS.

Thanks.

iamdirkcalloway · October 1, 2007, 4:51pm

On Sat, Sep 29, 2007 at 12:27:50PM -0400, Erik M. wrote:

We are going to hack AAF a bit to make the optimize optional. Should
we flush every time we bulk index if we don’t optimize?

yes, I’d do so. It will be done when the index class closes the
underlying writer, anyway.

I just made bulk_index a bit more configurable, you may now pass
:optimize => false to skip the optimization step.

I’m running this application on EC2, so I think part of the problem
is the poor IO performance on the VPS.

Yes, I guess poor IO performance and optimizing a 3GB index aren’t an
optimal combination

cheers,
Jens

–
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa