Interesting question on how ferret indexes work

I have a question about how Ferret handles intersection queries. Say I
do an intersection search on two fields, so I do ‘ruby author:(jared)’
(no quotes)

Does Ferret do some kind of fancy index intersection to efficiently
retrieve only items that meet both queries? Or does it just find all
the ruby’s, then prune away everything that doesn’t match
author:(jared)? Does it run both searches independently, then take the
intersection?

Help understanding this would be much appreciated.

On Thu, Sep 20, 2007 at 06:10:45AM +0200, Jared Friedman wrote:

I have a question about how Ferret handles intersection queries. Say I
do an intersection search on two fields, so I do ‘ruby author:(jared)’
(no quotes)

Does Ferret do some kind of fancy index intersection to efficiently
retrieve only items that meet both queries? Or does it just find all
the ruby’s, then prune away everything that doesn’t match
author:(jared)? Does it run both searches independently, then take the
intersection?

that’s really an interesting question, and I’m not really sure about the
answer.

From looking at the ConjunctionScorer class of an old version of Ferret
(0.3.2, which was still mostly Ruby), I’d say it first collects the
document ids matching each single clause, and then steps through them
collecting those that have been matched by all clauses.

Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database