i have a pretty sweet document search site running on ferret. i’m going
to be updating my blog with everything i learned about production use of
aaf and ferret shortly.
but first, i have a question…
using the trunk of aaf, rebuild_index creates a new but empty index
(only the segments files), which is not what i want. i want an updated
version of the table data in question. i’m sure rebuild_index has been
tested in the trunk so it must be my fault, not understanding something
about it.
i’m disabling ferret on the class level in environment.rb and overriding
the ferret_enabled? method in aaf to only return true if the state is
‘approved’ or ‘active’.
Document.disable_ferret
module ActsAsFerret
module InstanceMethods
def ferret_enabled?(is_bulk_index = false)
@ferret_disabled.nil? && (is_bulk_index ||
self.class.ferret_enabled?) && (self.state == “approved” || self.state
== “active”)
end
end
end
then i’m bulk indexing any new documents via a rake task using
acts_as_state_machine to only index documents that have an approved
state…
desc “Updates the ferret index for the application of approved
documents.”
namespace :ferret do
task :index => :environment do
docs = Document.find(:all, :conditions => “state = ‘approved’”)
unless docs.empty?
Document.bulk_index docs
docs.each {|r| r.activate! } # acts_as_state_machine
puts “Completed Ferret Indexing of #{docs.size} approved
documents.”
else
puts “No documents to index”
end
end
run daily
desc “Updates the ferret index with the new count increments.”
task :update => :environment do
Document.enable_ferret
Document.rebuild_index
puts “Completed Ferret Re-Indexing of active documents.”
end
end
now, the Document Model has a counter cache to track downloads. so
every day i’d like to run the :update task in the ferret rake task
because i’ve got a sort for popularity based on the downloads_count.
here’s my aaf declaraion in the model. i believe i’m doing this right.
class Document < ActiveRecord::Base
acts_as_ferret :fields => { :title => {:boost => 5},
:description => {:boost => 1},
:cached_tag_list => {:boost => 10},
:downloads_count => {:index =>
:untokenized_omit_norms, :term_vector => :no},
:created_at_for_sort => {:index =>
:untokenized_omit_norms, :term_vector => :no},
:state => {:boost => 0}
}, :remote => true
def created_at_for_sort
return self.created_at.to_i
end
#…
end
this solution came from a lot of trial and error and many many varied
sources, blogs, ferret o’reilly book. it works great in production,
except for the updating of the index via rebuild_index, so at the moment
i can’t sort via the new counter_cache number.
I hope this sufficiently explains the problem I’m having. Any help
would be appreciated.