Range searches some times they work, some times not

Hi i’m using ferret to enable geographical postcode. I take a postcode
and distance in miles from the user, strip off the outcode and then
retrieve the associated x y coordinates in metres from the db. Then i
get two temp x’s and y’s and search for all results that are within the
box, see code below.

Problems start to occur when i search on big distances so for example

40 miles from “G1”
VoObject.ferret_index.search(" x:[206826 335573] AND y:[590526
719273]").total_hits
=> 165

300 miles
VoObject.ferret_index.search(“y:[172098 1137702]”).total_hits
Ferret::QueryParser::QueryParseException: Error occured in q_range.c:121

  • range_new
    Upper bound must be greater than lower bound. “1137702” <
    “172098”

      from 
    

/usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in
parse' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:inprocess_query’
from
/usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:560:in
do_search' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:233:insearch’
from /usr/lib/ruby/1.8/monitor.rb:229:in synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:232:insearch’
from (irb):16

So what am i doing wrong? How have other people used ferret for
geographical searches? Is there another way that i can define the range
so that it works properly?

because I’m also getting other crazy and just plain wrong results

VoObject.ferret_index.search(“y:[0 9]”).total_hits
=> 167

thats telling me that all the test data is with 8 metres of the
origin…

thanks in advance.
clare

if their_outcode && their_outcode.size > 0
temp_hwz = HwzPostcode.find(:first, :conditions => [‘outcode =
?’,their_outcode])
range_x_left = temp_hwz.x - (postcode_distance.to_f1.60934 * 1000)
range_x_right = temp_hwz.x + (postcode_distance.to_f
1.60934 * 1000)
range_y_top = temp_hwz.y + (postcode_distance.to_f1.60934 * 1000)
range_y_bottom = temp_hwz.y - (postcode_distance.to_f
1.60934 * 1000)

query += " AND x:[#{range_x_left.to_i} #{range_x_right.to_i}] AND
y:[#{range_y_bottom.to_i} #{range_y_top.to_i}]"
end

On 9/20/06, Clare [email protected] wrote:

719273]").total_hits
from
`search’

y:[#{range_y_bottom.to_i} #{range_y_top.to_i}]"
end

Hi Clare,

Ranges are calculated according to lexical ordering, not numerical
ordering. Try this:

puts ["0", "9", "167"].sort

You’ll see that “167” does indeed fall between “0” and “9”. Now try
this:

puts ["000", "009", "167"].sort

So that should explain what you have to do. You need to pad all
numbers to a fixed width. Alternatively you could build a custom
IntegerRangeFilter and combine it with a ConstantScoreQuery. Here is
an example for Floats:

require 'rubygems'
require 'ferret'

class FloatRangeFilter
  attr_accessor :field, :upper, :lower, :upper_op, :lower_op

  def initialize(field, options)
    @field = field
    @upper = options[:<] || options[:<=]
    @lower = options[:>] || options[:>=]
    if @upper.nil? and @lower.nil?
      raise ArgError, "Must specify a bound"
    end
    @upper_op = options[:<].nil? ? :<= : :<
    @lower_op = options[:>].nil? ? :>= : :>
  end

  def bits(index_reader)
    bit_vector = Ferret::Utils::BitVector.new
    term_doc_enum = index_reader.term_docs
    index_reader.terms(@field).each do |term, freq|
      float = term.to_f
      next if @upper and not float.send(@upper_op, @upper)
      next if @lower and not float.send(@lower_op, @lower)
      term_doc_enum.seek(@field, term)
      term_doc_enum.each {|doc_id, freq| bit_vector.set(doc_id)}
    end
    return bit_vector
  end

  def hash
    return @field.hash ^ @upper.hash ^ @lower.hash ^
           @upper_op.hash ^ @lower_op.hash
  end

  def eql?(o)
    return (o.instance_of?(FloatRangeFilter) and @field == o.field 

and
@upper == o.upper and @lower == o.lower and
@upper_op == o.upper_op and @lower_op == o.lower_op)
end
end

You’ll have to work out what is going on here yourself though. I have
no time for explanation. Note that this won’t perform very well
compared to the padded field version because so much is going on in
the Ruby code. I could possibly be persuaded to implement this in C.

Cheers,
Dave

David B. wrote:

On 9/20/06, Clare [email protected] wrote:

You’ll have to work out what is going on here yourself though. I have
no time for explanation. Note that this won’t perform very well
compared to the padded field version because so much is going on in
the Ruby code. I could possibly be persuaded to implement this in C.

Cheers,
Dave

I’ve also implemented a geographic search using lucene/ferret. There a
couple of key points that helped me ‘get it’ -

1 - lucene does lexographic, not numeric, search so to search on numbers
you need to convert them to a string which works for lexographic sort
(usually by adding leading zeros or a fixed number of decimal places
after the decimal point) [as pointed out by Dave above]

2 - a range search is actually converted into a boolean search
internally (someone please correct me if I got that wrong) so doing a
range search over massive ranges may be problematic by exceeding
accepted query lengths. Then you start a trade off between accuracy
(more decimal places) and speed. The way I got round it was to assume
that for my purposes search only needed to be accurate to about 100m so
formatting longitude/latitude to 3 decimal places would work fine (I
live in a small country :slight_smile:

Sam

On 9/21/06, Sam G. [email protected] wrote:

2 - a range search is actually converted into a boolean search
internally (someone please correct me if I got that wrong) so doing a
range search over massive ranges may be problematic by exceeding
accepted query lengths. Then you start a trade off between accuracy
(more decimal places) and speed. The way I got round it was to assume
that for my purposes search only needed to be accurate to about 100m so
formatting longitude/latitude to 3 decimal places would work fine (I
live in a small country :slight_smile:

This used to be correct, but it is no longer the case in either Ferret
or Lucene (version 2.0). RangeQueries get reduced to
ConstantScoreQueries which use a Filter. So Sam, you can now feel free
to use RangeQueries with as large a Range as you like :-).
WildcardQueries, FuzzyQueries and PrefixQueries do however get
rewritten as BooleanQueries in Lucene and MultiTermQueries in Ferret
so you do need to be careful when using these queries. Ferret’s
MultiTermQuery is a lot more efficient than a BooleanQuery for this
task so it it allows a lot more clauses then you could probably use
efficiently in Lucene. Also, the query “*” gets rewritten as a
MatchAllQuery so it is safe to use.

Cheers,
Dave