Utility of default_field

The documentation* states that when using a single index for multiple
models, the default_field list should be set to the same thing for
all models.

However, in my application, all my models have very different fields
and this is not possible. I still want the results returned sorted by
term frequency across all indexed content in each model.

What is the purpose of default_field? Under what multi-model
circumstance, if any, is it not necessary to use it?

Thanks,
John

*http://projects.jkraemer.net/rdoc/acts_as_ferret/classes/
ActsAsFerret/ActMethods.html#M000009

Hi!

On Wed, Jan 02, 2008 at 02:30:23PM -0500, John B. wrote:

The documentation* states that when using a single index for multiple
models, the default_field list should be set to the same thing for
all models.

However, in my application, all my models have very different fields
and this is not possible. I still want the results returned sorted by
term frequency across all indexed content in each model.

Short answer:

It’s safe for you to specify the same large :default_field list
containing
fields from all models in all your acts_as_ferret calls. aaf doesn’t use
this list but only hands it through to Ferret’s query parser which uses
it to expand queries that have no fields specified.

What is the purpose of default_field? Under what multi-model
circumstance, if any, is it not necessary to use it?

Long answer:

The default_field option determines which fields Ferret will search for
when there is no explicit field specified in a query.

Suppose your index has the fields :id and :text (with id being
untokenized). With an empty default_field value (or ‘*’, which means the
same), and a :or_default value of false (as aaf sets it) you get parsed
queries like this:

‘tree’
→ ‘id:tree text:tree’

‘some tree’ (meaning some AND tree because or_default == false)
→ ‘+(id:some) +(id:tree text:tree)’

With ‘some’ being a stop word, one would expect the second query to
yield the same result as the first one, but since the query is run
against all fields, including :id, which is untokenized and therefore
has no analyzer, we end up querying our id field with a required term
query and get no result at all.

I remember there has been some debate about this topic a year ago or so,
and in theory it would be possible for Ferret to parse queries the other
way
around to work around this issue, but afair Dave brought up some good
reasons to leave it as it is.

The solution is to tell Ferret which fields to search when no fields are
specified for a query (or part of a query) with the :default_field
option. Usually aaf does this automatically by collecting all tokenized
fields from the model. Now with a shared index there are n models but
one index, so here we need to have a joint list of all tokenized fields
across all these models for the :default_field parameter.

Since aaf is called in every single model, I didn’t find an easy way to
build this list automatically and decided to leave it up to the user to
specify this list in the acts_as_ferret calls of every model. Not really
DRY indeed. Patches welcome :wink:

Here’s a small script reproducing the issue:
http://pastie.caboo.se/134443

So to summarize:

You need to specify :default_field if you’re using :single_index => true
in combination with :or_default => false (aaf default) and you have
queries that may contain stop words and that are not constrained to a
list of fields specified in the query string.

Cheers,
Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

On Jan 3, 2008, at 10:38 AM, Jens K. wrote:

You need to specify :default_field if you’re using :single_index =>
true
in combination with :or_default => false (aaf default) and you have
queries that may contain stop words and that are not constrained to a
list of fields specified in the query string.

Thank you Jens for your elaborate response.

Our code removes stop words from all queries before sending them to
AAF. In this case, would the lack of setting default_field ever be a
problem? Perhaps this is why we have not seen problems even though we
have never set default_field.

Cheers,
John

i added your comments to the wiki:

http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage?
action=diff&version=11

On Thu, Jan 03, 2008 at 11:06:24AM -0500, John B. wrote:

Our code removes stop words from all queries before sending them to
AAF. In this case, would the lack of setting default_field ever be a
problem? Perhaps this is why we have not seen problems even though we
have never set default_field.

exactly, in this case you shouldn’t have any problems.

Cheers,
Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database