Re: Help with Multiple Readers, 1 Writer scenario

Neville_B · September 4, 2006, 2:44pm

Thanks for your reply Dave,

The one situation where you might be better off using
a single IndexReader is when you are relying on caching.
Filters and Sorts are cached per IndexReader and Sorts
in particular can take up a fair chunk of memory so if
you have a large index (large as in number of documents,
not size of data) then you may be better off with a single
IndexReader. IndexReader is thread-safe so using it concurrently
should be fine.

Just to clarify, I’m using Ferret::Index::Index concurrently at the
moment, and I’m not getting concurrent searches via #search_each. IE, if
a slow wild-card search arrives first, all subsequent searches wait
until the wild-card search completes.

So I guess #search_each is “synchronised”?

Therefore to have multiple searches on an index concurrently, I really
need an IndexReader per thread and I would need to manage a pool of
reusable IndexReaders?

Any pointers on how other web apps [not using Rails] handle multiple
Ferret readers?

You can actually pass an array of readers as the first (only)
parameter to
IndexReader.new.

reader = IndexReader.new([reader1, reader2, reader3])

Interesting … I had a look, but I don’t really understand what this
does? Would you elaborate please

Thanks for your help,

Neville

Neville_B · September 4, 2006, 2:44pm

On 9/4/06, Neville B. [email protected] wrote:

Just to clarify, I’m using Ferret::Index::Index concurrently at the
moment, and I’m not getting concurrent searches via #search_each. IE, if
a slow wild-card search arrives first, all subsequent searches wait
until the wild-card search completes.

So I guess #search_each is “synchronised”?

That’s correct. Otherwise it would be possible for the document IDs of
the documents to change between the time the search is run and the
time the document is referenced. For the benefit of those who don’t
know this, document IDs are not constant. They represent the position
of the document in the index. Think of it like an array. Let’s add 5
documents to the index.

[0,1,2,3,4]

Now let’s delete documents 1 and 2;

[0,3,4]

So document 4 now has a doc_id of 2. If this happened in the middle of
a search you’d have a problem. So instead we synchronize the the
Index#search and Index#search_each methods. Now this isn’t the case
for Searcher#search and Searcher#search_each since the IndexReader
that Searcher uses remains consistent so you should be able to use
Searcher concurrently.

Therefore to have multiple searches on an index concurrently, I really
need an IndexReader per thread and I would need to manage a pool of
reusable IndexReaders?

Using Ferret::Index::Index this would be true. But if performance is a
concern you should definitely use a Ferret::Search::Searcher object
instead anyway and you’ll be able to use it concurrently.

Any pointers on how other web apps [not using Rails] handle multiple
Ferret readers?

Let us know if using the Searcher object isn’t adequate.

You can actually pass an array of readers as the first (only)
parameter to
IndexReader.new.

reader = IndexReader.new([reader1, reader2, reader3])

Interesting … I had a look, but I don’t really understand what this
does? Would you elaborate please

A MultiReader object was initially what was used to read and search
multiple indexes at a time. This functionality is now simply handled
by the IndexReader object. There are several uses for this. One was to
store each model in a separate index and you could then offer search
across multiple models using a MultiReader. Another use-case might be
to have multiple indexes to speed up indexing. If for example you are
scraping websites it is a very good idea to have multiple scraping
processes. The best way to do this is to have each process indexing to
its own index. You could then search all indexes at once using a
MultiReader or you could also merge all indexes into a single index.

Hope that makes sense.

Cheers,
Dave