I created a fairly simple sample project to try out acts_as_ferret and
present the results.
The test set is relatively easy: I have extracts from 6
Wikipedia-Articles about several Topics, which are copied into a model
that has two fields: title and text. This works quite well, until I try
to use #more_like_this, which returns all of the other articles, even if
they have nothing to do with the active article. I debugged a bit and
found out that the query build by #more_like_this is nothing more then
“-id:”.
(so the result is correct)
P.S.: There is another minor bug. Altough #more_like_this does set a
default option for :field_names (line #35), this option leads to a crash
in #retrieve_terms. The default option is nil and #retrieve_terms thus
tries to call #each on nil. (line #113)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
first of all, 6 documents is not really a corpus to judge the usability
of more_like_this - by default it will only consider terms occuring in
at least 5 documents to be of any relevance (:min_doc_freq option). So
if you have very different documents where the only common words are
filtered out as noise words, you’ll end up without any terms to use
for finding similar documents, which would lead to the query you
mentioned.
However more_like_this should indeed return an empty result set in this
case
Besides that, you should store term vectors (give :term_vector => :yes
for the fields you want to use more_like_this on in your call to
acts_as_ferret), this will speed up the search for relevant terms.
Jens
On Tue, Jul 17, 2007 at 12:11:55PM +0200, Florian G. wrote:
I am aware of the fact that the corpus is a bit small (but nicer for
presentation purposes), but it surprised me that I found no way (even
when playing with the parameters) to get at least 1 common word from the
the set. (it wasn’t intended to be usable, but presentable)
I will play around a bit more and add some documents. Thanks for the
hints.
Greetings
Florian G.
Jens K. wrote:
However more_like_this should indeed return an empty result set in this
On Tue, Jul 17, 2007 at 12:11:55PM +0200, Florian G. wrote:
to use #more_like_this, which returns all of the other articles, even if
Either I’m doing something entirely wrong or there is a bug. Before
Thanks in advance.
Florian G.
P.S.: There is another minor bug. Altough #more_like_this does set a
default option for :field_names (line #35), this option leads to a crash
in #retrieve_terms. The default option is nil and #retrieve_terms thus
tries to call #each on nil. (line #113)