Document scores

We’re using Ferret (but not acts_as_ferret) on a project I’m working
on, and I ran into a problem with the document scores returned from
searches.

I consider myself a Ferret noob… I know a little about its API,
having read the O’Reilly shortcut, but I couldn’t find a solution to
this problem there. Please allow me to explain:

It started when I noticed that all of the relevance scores for each
result were exactly the same. By reading the shortcut, I found out
that happened because a range query (with initial and final dates) was
always included in the queries passed to Ferret, and Ferret’s
RangeQuery always return results with identical scores, because it
uses a ConstantScoreQuery internally.

So far, so good - I removed this range query from the application
code, as an experiment, and passed a simple string that translates
into a TermQuery to it. From what I know of Ferret, it should return
normal scores, but all of them came back as 0.

Is this a known behavior/bug? Or did I do something wrong with the
search or the indexing? I know the latter is more likely, and if
needed I can try to provide some trimmed-down example code.


Bira

Hi Bira,

this just sounds like your search is getting no hits. The
ConstantScoreQuery was giving everything a minimum score but no other
hits increased the score. Now you’ve removed the only thing that was
providing a score, so it’s dropped to 0.

Make sure your indexing and searching is working correctly. Try the
ferret-browser tool to review your index - see if it’s what you expect
(i.e: has the terms you’re searching for).

If all this is working as expect, try posting a snip of your code where
you define the index, and where you do a search and we should be able to
help.

John.


http://www.brightbox.co.uk - UK/EU Ruby on Rails Hosting
http://johnleach.co.uk

On Fri, Feb 22, 2008 at 4:29 PM, John L. [email protected]
wrote:

Hi Bira,

If all this is working as expect, try posting a snip of your code where
you define the index, and where you do a search and we should be able to
help.

John.

I’ve managed to reduce it to a simple example, which I’ve packed in a
11KB zip file, most of which is a sample text for indexing (an e-mail
message from the publicly available Enron archive). Does the list
accept attachments?


Bira

Hi!

On Mon, Feb 25, 2008 at 04:15:00PM -0300, Bira wrote:

I’ve managed to reduce it to a simple example, which I’ve packed in a
11KB zip file, most of which is a sample text for indexing (an e-mail
message from the publicly available Enron archive). Does the list
accept attachments?

not sure, just try it out :slight_smile: or upload it somewhere on the ferret wiki.

cheers,
Jens


Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

On Mon, Feb 25, 2008 at 4:56 PM, Jens K. [email protected] wrote:

not sure, just try it out :slight_smile: or upload it somewhere on the ferret wiki.

OK :). I’m sending the example attached to this message.

There’s two Ruby files (indexer.rb and searcher.rb), along with a text
file containing an e-mail from the Enron archives, which is the
indexable sample.

After extracting it to a directory, running indexer.rb will index that
single message. Running searcher.rb will perform a pre-definded search
on the index, and print out the result and its score.

In my local environment (Ferret 0.11.6 on Linux), a single result is
returned, as expected, and it’s properly highlighted and everything.
Its score is 0. The search is a simple term query for “earnings”.

From my experience with scores, I found that you have to establish
boosts for each field, otherwise you’ll always get scores that are too
low.

Try:

  • configuring boost for, say 3 fields. E.g.: tags => 20, title => 10,
    description => 15.
  • Adding entries to the index.
  • performing searches that hit each of these fields in separate so you
    can compare.

Then check the score in the output.

Hi!

On Wed, Feb 27, 2008 at 03:04:27PM -0300, Bira wrote:
[…]

By the way, did the message containing the example arrive?

yes it did. I tried it out and got the same result as you - score of
0.0.

Removing the :index => :omit_norms option from the FieldInfos
declaration leads to the expected result, a non-zero score. It’s not
clear from the API docs if this is the expected behaviour:

:omit_norms | Same as :yes except omit the
| norms file. The norms file can
| be omitted if you don’t boost
| any fields and you don’t need
| scoring based on field length.

Here’s Ferret’s explanation of the score computation:

0.0 = field_weight(message:earnings in 0), product of:
3.162278 = tf(term_freq(message:earnings)=10)
0.3068528 = idf(doc_freq=1)
0.0 = field_norm(field=message, doc=0)

Looks like Ferret should rather not consider the zero field_norm when
computing
the score in this case.

Cheers,
Jens


Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

On Tue, Feb 26, 2008 at 8:12 PM, Julio Cesar O. [email protected]
wrote:

can compare.
I tried again, setting :default_boost to 1000 in the example, and the
score still came up as zero.

By the way, did the message containing the example arrive?


Bira