Quick question (possibly!) - I’ve got a few records indexed and doing a
search for ‘test’ reports in no hits even though I know the word ‘tests’
exists in the indexed field. Doing a search for ‘tests’ produces a
result. I would have thought that ‘test’ would match ‘tests’ but no such
luck!
Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Hope that helps,
Dave
Hi Dave,
Many thanks for the help, it does help! However given the short timespan
for this project, I think the users of the site will just have to be a
bit more specific in their search terms Cheers and will bookmark your
reply for a later project.
Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
class MyAnalyzer
puts index.search("test").total_hits
Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)
Albert
Hi Albert,
Could you show us your implementation of StemmingAnalyzer as well.
Also, you need to be sure to use the same analyzer for both indexing
and analysis, although I think you already new this.
class MyAnalyzer
puts index.search("test").total_hits
Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)
Albert
Hi Albert,
Could you show us your implementation of StemmingAnalyzer as well.
Also, you need to be sure to use the same analyzer for both indexing
and analysis, although I think you already new this.
things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake
…
Cheers,
Albert
Sorry, I must have been tired last night. The problem is obvious to me
now. You need to set the :fields parameter. The above query parser
should work as long as you explicitly specify all fields in your
query. For example:
"content:(ruby rails) title:(ruby rails)"
But if you want to search all fields by default then you need to tell
the QueryParser what fields exist. The Index class will handle all of
this for you including using the same analyzer as is used during
indexing. It looks like you are using the Index class for your
searches so why not just leave the query parsing to it. Otherwise you
can get the fields from the reader.
things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake
…
Cheers,
Albert
Sorry, I must have been tired last night. The problem is obvious to me
now. You need to set the :fields parameter. The above query parser
should work as long as you explicitly specify all fields in your
query. For example:
"content:(ruby rails) title:(ruby rails)"
But if you want to search all fields by default then you need to tell
the QueryParser what fields exist. The Index class will handle all of
this for you including using the same analyzer as is used during
indexing. It looks like you are using the Index class for your
searches so why not just leave the query parsing to it. Otherwise you
can get the fields from the reader.
Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Quick question (possibly!) - I’ve got a few records indexed and doing a
search for ‘test’ reports in no hits even though I know the word ‘tests’
exists in the indexed field. Doing a search for ‘tests’ produces a
result. I would have thought that ‘test’ would match ‘tests’ but no such
luck!
Thanks,
Alastair
Alastair - if you only want to find the plural of something and not the
full stem of words then ROR has a plurisation capability. It will take
test and bring back all the plurals or take tests and bring back the
singulars. You can then search on all these words. It is not a full
stemmer but in some circumstances perhaps this may be all that you are
wanting to do.
One thing to watch that caught us out was that as standard
pluralistation of words with two ‘ss’ at the end does not work properly.
For example, “glass” would come back as “glas” from the pluralizer.
There is a simple fix that is in the ROR forum that covers all this off.
I would only use the ror pluraliser if all you are looking to do is
bring back plurals of words and are not interested in the full stemming
of the words. For example, if you do a search on “tax” full stemming
should also search on “taxes” and “taxation”. Pluralise would not search
on “taxation”.
Can someone give me an idiots guide as to how to implement this custom
stemming analyser. I do not know where to start.
Create the analyzer as David outlined it and name the file
“my_analyzer.rb”. If you put it in /app/models you don’t need any
require statements since every .rb file in /app/models gets
automagically ‘required’ by Rails.
end
end
When you create an Index instance, pass it your analyzer, like so:
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
Test your analyzer, e.g.
index << “walking”
index << “walked”
index << “walks”
index.search(“walk”).total_hits # -> 3
Thanks for your patience.
You’re welcome. And may I kindly ask you to use a valid email address
and perhaps your real name for future posts?
Can someone give me an idiots guide as to how to implement this custom
stemming analyser. I do not know where to start.
Thanks for your patience.
Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
I tried to rebuild my index but it crashes out with the following error:
VoObject.rebuild_index
NameError: uninitialized constant MyAnalyzer
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in const_missing' from script/../config/../config/../app/models/vo_object.rb:14 from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:140:inload’
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:56:in require_or_load' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:30:independ_on’
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:85:in require_dependency' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:98:inconst_missing’
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in
`const_missing’
from (irb):11
Nasty eh?
Any idea what is going on here? Why can’t my VoObject model see the new
analyzer?
Thanks again.
You’re welcome. And may I kindly ask you to use a valid email address
and perhaps your real name for future posts?
I used to post with a valid email address. But then the number of spam
messages i recieved went from 1 or 2 a week to 50-60 a day. Ruby Forum
used to print the email addresses on the page. Heres a comprimise.
NameError: uninitialized constant MyAnalyzer
Sorry, I forgot to mention that the directory structure needs to
resemble the module nesting, i.e. the file must go in app/models/
ferret/analysis instead of just app/models.
Cheers,
Andy
I’ve been trying to use the solution for stemming discussed in this
thread and have run into a bit of trouble.
I’m using this analyzer:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
The first time I search for something a new index is created in index,
and it successfully returns a set of results. The second time I search,
however, I get a strange error:
This is just postscript correction for this thread, in case anyone else
browses to it (like i did) and gets sent down the slightly wrong track.
If you’re going to include the :analyzer option in your call to
acts_as_ferret, then it needs to live inside another option hash called
:ferret. EG, some of the examples above say to do this:
NameError: uninitialized constant MyAnalyzer
Sorry, I forgot to mention that the directory structure needs to
resemble the module nesting, i.e. the file must go in app/models/
ferret/analysis instead of just app/models.
Cheers,
Andy
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.