Hi,
I’ve implemented synonym searching in my rails application but have
an idea I’d like to implement but can’t figure out how to do. The
idea is that I’d like to give the end user the choice on whether to
search for the synonym of a word or not. Preferably by extending the
query language to parse a construct similar to ‘%word1’ and then have
the word turned into a or list (i.e., word1|word2|word3|…).
Currently, the query parser constantly calls SynonymTokenFilter to
get synonyms for each token. Is there a way I can go about achieving
this functionality?
Here’s an overview of what I’ve done so far:
My model classes in my rails app use acts_as_ferret with a call that
looks like:
acts_as_ferret(
:fields => [:body],
:store_class_name => true,
:ferret => {
:or_default => false,
:analyzer => SynonymAnalyzer.new(WordnetSynonymEngine.new, [])
}
)
I created a SynonymAnalyzer and SynonymTokenFilter:
class SynonymAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(synonym_engine, stop_words =
FULL_ENGLISH_STOP_WORDS, lower = true)
@synonym_engine = synonym_engine
@lower = lower
@stop_words = stop_words
end
def token_stream(field, str)
ts = StandardTokenizer.new(str)
ts = LowerCaseFilter.new(ts) if @lower
ts = StopFilter.new(ts, @stop_words)
ts = SynonymTokenFilter.new(ts, @synonym_engine)
end
end
class SynonymTokenFilter < Ferret::Analysis::TokenStream
include Ferret::Analysis
def initialize(token_stream, synonym_engine)
@token_stream = token_stream
@synonym_stack = []
@synonym_engine = synonym_engine
end
def text=(text)
@token_stream.text = text
end
def next
return @synonym_stack.pop if @synonym_stack.size > 0
if token = @token_stream.next
add_synonyms_to_stack(token) unless token.nil?
end
return token
end
private
def add_synonyms_to_stack(token)
synonyms = @synonym_engine.get_synonyms(token.text)
return if synonyms.nil?
synonyms.each do |s|
@synonym_stack.push(
Token.new(s, token.start, token.end, 0))
end
end
end
FInally a WordnetSynonymEngine that queries my wordnet index I created:
class WordnetSynonymEngine
include Ferret::Search
def initialize(index_name = “wordnet”)
@searcher = Searcher.new("#{RAILS_ROOT}/index/#{ENV
[‘RAILS_ENV’]}/#{index_name}")
end
def get_synonyms(word)
@searcher.search_each(TermQuery.new(:word, word)) do |doc_id,
score|
return @searcher[doc_id][:syn]
end
return nil
end
end
It works great except that I’d really like that ability to only run
tokens through the SynonymTokenFilter when they are prepended by an
unescaped % sign.
Also, if anyone is interested I can post the code for turning the
wordnet prolog database into a ferret database (primarily recoding
the java lucene program that did the same thing to ruby and ferret).
Thanks,
Curtis