I’m using a custom stem analyser in my searches and my indexing. The
analyser is defined thus:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
text.downcase!
RAILS_DEFAULT_LOGGER.debug “SEARCHING, field = #{field.inspect},
text
= #{text.inspect}”
tokenizer = StandardTokenizer.new(text)
filter = StemFilter.new(tokenizer)
filter
end
end
end
I use it in my indexing like this:
acts_as_ferret({ :store_class_name => true,
:ferret => { :analyzer =>
Ferret::Analysis::StemmingAnalyzer.new },
:fields => {:property_names => { :boost => 3.0 },
…etc
}})
And in a search like this:
search_class.find_ids_with_ferret(search_term, {:limit => 10000,
:analyzer
=> Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score|
r_id = r_id.to_i
ferret_ids << r_id
self.scores_hash[r_id] = score
end
I have a problem with case sensitivity - basically, searches only work
when
they are lowercase: even when it looks like the text stored in the index
is
uppercase. From the console -
resource.to_doc
=> {:resource_id=>“59”, :property_names=>“Bb Clarinet Clarinet Family
Woodwind Instrumental and Vocal Image Resources Types” }TeachingObject.find_with_ferret(“Vocal”, :page => 1, :per_page =>
1000).include?(resource)
=> falseTeachingObject.find_with_ferret(“vocal”, :page => 1, :per_page =>
1000).include?(resource)
=> true
I think i have my stemming set up wrong, i’m not sure if it is even
being
used. I implemented it so that searches allowed pluralised and singular
terms, and that seems to work, eg
TeachingObject.find_with_ferret(“vocals”, :page => 1, :per_page =>
1000).include?(resource)
=> true
But the case sensitivity thing has me stumped. I thought that the
downcase!
call on the search term would make case irrelevant for searching but
that
seems not to be the case. Can anyone set me straight?