Ferret success stories?

Hi all,

there was a recent thread[1] on rails-deploy about Ferret in which a lot
of people complained of problems using it in production.

I’ve been using Ferret (with DRb) for many months now with no serious
issues. I’m assuming the posters know what they’re doing so I’m
guessing they’re just using Ferret in higher-scale environments than me.

I spoke to someone in person yesterday who claimed that Ferret over DRb
couldn’t keep up with their use rate and had been investigating
replicating the ferret database between two machines.

With all these bad experiences, I’d like to hear about some good
experiences. Anyone care to comment? Anyone using it under huge load?
Care to provide some numbers and some notes about how you’ve made it
work?

John.

[1]
http://groups.google.com/group/rubyonrails-deployment/browse_thread/thread/980fe7cb20cb97dd/bc798b52f439020c


http://www.brightbox.co.uk - UK Ruby on Rails hosting

On Jan 25, 2008, at 7:02 AM, John L. wrote:

I spoke to someone in person yesterday who claimed that Ferret over
DRb
couldn’t keep up with their use rate and had been investigating
replicating the ferret database between two machines.

With all these bad experiences, I’d like to hear about some good
experiences. Anyone care to comment? Anyone using it under huge
load?
Care to provide some numbers and some notes about how you’ve made it
work?

I had used ferret w/ DrB at Technorati on a project that had several
indexes of 5-10M documents. To make it work well I had to limit the
update rates to the index (I think I took it down to about 1-2/s).

To go to higher updates rates I would have had to change how we were
writing and serving indexes.

-ryan

I’m using Ferret/aaf with the DRb server under a medium load at
Ravelry.com. I think that we peak at 10-12 queries per second and a
little less than 1 update per second.

My biggest problem has been indexing speed. I’ve been gradually
switching over to Sphinx (http://www.sphinxsearch.com/) for indexes
that don’t have to be updated in realtime (places where I can afford
several minutes of lag). Emulating near realtime index updates in
Sphinx is a little hacky but I find that it is worth it.

Casey

I am successfully using ferret with a few caveats, none of which to
my knowledge would be solved by using another solution.

I believe many corruption problems are associated with a bug in aaf
that causes multiple indexes to be built at the same time in the same
place. I’m not sure if that’s been addressed as I didn’t see any
response after reporting it months ago.

My primary issues is that the ferret index needs to be optimized
before it performs well. Since optimization locks the index from
reading, and can take 30 seconds on my index, this severely limits
how often I can update my index. I am working on a solution by
running two ferret servers, but this is requiring extensively
modifying the aaf plug-in.

I have investigated Sphynx, but I don’t think that it solves my
problem of large amounts of constant updates.


Alex Neth
Liivid Inc / cribQ
www.liivid.com / www.cribq.com
[email protected]
+1 206 499 4995
+86 13761577188

Hi Alex,

sorry about not responding to that mail, been pretty busy with other
stuff at that time.

I just dugg it out of my mail archive now and yes, I’d like to have a
look at your locking code to prevent parallel index rebuilds :slight_smile:

Regarding your optimization troubles - I’m afraid I don’t know what’s
going wrong there with your index being slow when you don’t optimize it
(unless you’re sorting your search results by something else than
relevancy - that is indeed known to be slow when the index isn’t
optimized).

Did you try to tweak some of Ferret’s more obscure indexing parameters
like :merge_factor (lowering it (the Ferret shortcut suggests a value of
2 or 3 for better search performance) will let Ferret merge segments
more frequently so on average there are less files in the index which
should improve search performance)?

Cheers,
Jens

On Fri, Jan 25, 2008 at 04:00:33PM -0800, Alex Neth wrote:

reading, and can take 30 seconds on my index, this severely limits
www.liivid.com / www.cribq.com

Date: Fri, 25 Jan 2008 15:02:50 +0000
of people complained of problems using it in production.


http://www.brightbox.co.uk - UK Ruby on Rails hosting


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database