We’re using Ferret (but not acts_as_ferret) on a project I’m working
on, and I ran into a problem with the document scores returned from
searches.
I consider myself a Ferret noob… I know a little about its API,
having read the O’Reilly shortcut, but I couldn’t find a solution to
this problem there. Please allow me to explain:
It started when I noticed that all of the relevance scores for each
result were exactly the same. By reading the shortcut, I found out
that happened because a range query (with initial and final dates) was
always included in the queries passed to Ferret, and Ferret’s
RangeQuery always return results with identical scores, because it
uses a ConstantScoreQuery internally.
So far, so good - I removed this range query from the application
code, as an experiment, and passed a simple string that translates
into a TermQuery to it. From what I know of Ferret, it should return
normal scores, but all of them came back as 0.
Is this a known behavior/bug? Or did I do something wrong with the
search or the indexing? I know the latter is more likely, and if
needed I can try to provide some trimmed-down example code.
this just sounds like your search is getting no hits. The
ConstantScoreQuery was giving everything a minimum score but no other
hits increased the score. Now you’ve removed the only thing that was
providing a score, so it’s dropped to 0.
Make sure your indexing and searching is working correctly. Try the
ferret-browser tool to review your index - see if it’s what you expect
(i.e: has the terms you’re searching for).
If all this is working as expect, try posting a snip of your code where
you define the index, and where you do a search and we should be able to
help.
If all this is working as expect, try posting a snip of your code where
you define the index, and where you do a search and we should be able to
help.
John.
I’ve managed to reduce it to a simple example, which I’ve packed in a
11KB zip file, most of which is a sample text for indexing (an e-mail
message from the publicly available Enron archive). Does the list
accept attachments?
On Mon, Feb 25, 2008 at 04:15:00PM -0300, Bira wrote:
I’ve managed to reduce it to a simple example, which I’ve packed in a
11KB zip file, most of which is a sample text for indexing (an e-mail
message from the publicly available Enron archive). Does the list
accept attachments?
not sure, just try it out or upload it somewhere on the ferret wiki.
not sure, just try it out or upload it somewhere on the ferret wiki.
OK :). I’m sending the example attached to this message.
There’s two Ruby files (indexer.rb and searcher.rb), along with a text
file containing an e-mail from the Enron archives, which is the
indexable sample.
After extracting it to a directory, running indexer.rb will index that
single message. Running searcher.rb will perform a pre-definded search
on the index, and print out the result and its score.
In my local environment (Ferret 0.11.6 on Linux), a single result is
returned, as expected, and it’s properly highlighted and everything.
Its score is 0. The search is a simple term query for “earnings”.
On Wed, Feb 27, 2008 at 03:04:27PM -0300, Bira wrote:
[…]
By the way, did the message containing the example arrive?
yes it did. I tried it out and got the same result as you - score of
0.0.
Removing the :index => :omit_norms option from the FieldInfos
declaration leads to the expected result, a non-zero score. It’s not
clear from the API docs if this is the expected behaviour:
:omit_norms | Same as :yes except omit the
| norms file. The norms file can
| be omitted if you don’t boost
| any fields and you don’t need
| scoring based on field length.
Here’s Ferret’s explanation of the score computation: