Multiple index instances and ferret/acts_as_ferret

We’re running Ferret and acts_as_ferret in our production environment.
We
have multiple mongrels talking to a single index on a separate (virtual)
server over DRb. This is working ok for now, as our index updates are
fairly
infrequent. I’m concerned with the lack of rendundancy/scalability in
this
layout.

Our index won’t get too big - maybe 100k indexed objects, each no more
than
500 words - but it needs to be highly available, like the rest of our
site
(www.caring.com, if you are interested).

One alternate approach I’m considering would be to do something like
this:

  • disable the after_save callbacks in acts_as_ferret in production mode,
    to
    stop multiple mongrels writing to the index.
  • move all index writes to a centralized batch process which interacts
    with
    a ‘master’ index
  • periodically clone out the master index to slave indexes located
    locally
    to each user-facing rails index (not using DRb)

My last company used this approach for a lucene index, with lucene
running
behind a custom search webapp not that different from SOLR, so the
user-facing webservers retrieved search results over http from the
search
webapp. We had to write some fairly intricate scripting to stop and
start
the search webapps whilst we copied out the master index to the slaves.

Does anyone have any experience with this kind of approach? Is there
some
standard way to distribute and run multiple instances of an index?

Bonus question - how upset does a running mongrel get when the ferret
index
it talks to is suddenly replaced by a new set of files?

Thanks for any insights on how best to solve this.

Thanks,
Patrick Wright

Not upset at all if you do it like that: have two index directories
and
a symlink pointing to the one in use atm. Then sync to the currently
unused index, change over the symlink and tell your mongrel to re-open
it’s searcher.

as Jens said, we’re doing exactly that … take a look at the
switching here:

http://bugs.omdb.org/browser/branches/2007.1/lib/omdb/ferret/lib/util.rb

we’ve also added a “last-switched” status file (0 byte with timestamp),
to notify mongrel of a new index. So mongrels are using the old index
unless the status file gets a newer timestamp.

http://bugs.omdb.org/browser/branches/2007.1/lib/omdb/ferret/searcher.rb

Ben

On Wed, Oct 10, 2007 at 12:19:17PM -0700, Patrick Wright wrote:
[…]

One alternate approach I’m considering would be to do something like this:

  • disable the after_save callbacks in acts_as_ferret in production mode, to
    stop multiple mongrels writing to the index.
  • move all index writes to a centralized batch process which interacts with
    a ‘master’ index
  • periodically clone out the master index to slave indexes located locally
    to each user-facing rails index (not using DRb)

should work.

[…]

Does anyone have any experience with this kind of approach? Is there some
standard way to distribute and run multiple instances of an index?

omdb.org uses rsync to sync index versions.

Bonus question - how upset does a running mongrel get when the ferret index
it talks to is suddenly replaced by a new set of files?

Not upset at all if you do it like that: have two index directories and
a symlink pointing to the one in use atm. Then sync to the currently
unused index, change over the symlink and tell your mongrel to re-open
it’s searcher.

Cheers,
Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database