Stop words in query

Hello all,
Quick question, I’m using AAF and the following custom analyzer:

class StemmedAnalyzer < Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = ENGLISH_STOP_WORDS)
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)),
@stop_words))
end

However when my search term includes a stop word I never get any results
back. Once I remove the stop word I get the normal results back. Do I
need to do a search of my query for stop words and remove them myself?
Or is there something I’m doing wrong with passing my query to AAF?

Thanks,
Ray

Depends on how you produced your query. In general, your query has to
pass through the same analyzer that was used for indexing.

So, when building a PhraseQuery, for instance, you have to get each word
from the analyzer.

keywords.each {|keyword|
query = Search::PhraseQuery.new(:fieldname)
analyzer = StemmedAnalyzer.new
tokenizer = analyzer.token_stream(:fieldname, keyword)
while (token = tokenizer.next)
query << token.text
end
}

This is how I do it, it would be nicer if AAF would encapsulate this.

Regards,
Ewout

On Fri, Jan 12, 2007 at 12:07:07AM +0100, Raymond O’connor wrote:

@stop_words))
end

However when my search term includes a stop word I never get any results
back. Once I remove the stop word I get the normal results back. Do I
need to do a search of my query for stop words and remove them myself?
Or is there something I’m doing wrong with passing my query to AAF?

what version of aaf do you use, and how does your call to acts_as_ferret
look like ?

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote:

  while (token = tokenizer.next)
    query << token.text
  end

}

This is how I do it, it would be nicer if AAF would encapsulate this.

it should do this, if it doesn’t, I’d consider this a bug. There have
been problems with stop words in the past, but these should finally be
sorted out in current trunk.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

  tokenizer = analyzer.token_stream(:fieldname, keyword)
  while (token = tokenizer.next)
    query << token.text
  end

}

This is how I do it, it would be nicer if AAF would encapsulate this.

it should do this, if it doesn’t, I’d consider this a bug. There have
been problems with stop words in the past, but these should finally be
sorted out in current trunk.

I don’t see this solved in the trunk @ <http://projects.jkraemer.net/
acts_as_ferret/browser/trunk>.

In single_index_find_by_contents and find_by_contents, the ferret query
should be taken apart, and be analyzed using the analyzer given by the
user in the acts_as_ferret call.

Right?

Ewout

On Fri, Jan 12, 2007 at 01:15:29PM +0100, Ewout wrote:

  tokenizer = analyzer.token_stream(:fieldname, keyword)

I don’t see this solved in the trunk @ <http://projects.jkraemer.net/
acts_as_ferret/browser/trunk>.

In single_index_find_by_contents and find_by_contents, the ferret query
should be taken apart, and be analyzed using the analyzer given by the
user in the acts_as_ferret call.

no, this is done by the Ferret-Index instance aaf internally uses.
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66