I am getting ready to add searching to the property model of a real
estate
site I am working on and was looking for advice on
which search plugin to use.
I came to Rails to escape Java and I really don’t want to run a Java
Server
which as far as I know rules out Acts as Solr.
So it looks like a choice between Soda Search and acts as ferret. How
well
will these two handle normalized VS non-normalized
data. Like if I am using relations with other objects like has_many etc
will
these search plugins index the items owned by
my model as well.
Ferret has seg faulted a number of times for me taking down my mongrel
processes in a flaming ball of fire. I have monit checking my
processes, so they get restarted, but for some reason it can take up
to 10 minutes.
I’m currently in the process of switching all my acts_as_ferret to
tsearch2.
I’ve seen problems in peoples app from ferret as the index gets
bigger, it will sometimes segfault and crash mongrel or you will
start to get ferret locking errors on the index with multiple rails
processes trying to read/write from the same index.
HyperEstraier, and Sphinx have held up better with bigger indexes
then ferret for the apps I’ve seen.
I did hear that they are working on a drb daemon that will be the
only thing to write to the ferret index with your app talking to it
over drb, this may fix the locking and index corruption issues but i
haven’t seen it in the wild yet.
Ferret also has stemming. Ferret actually has more full text searching
capabilities than tsearch2, however I’ve never needed more than
tsearch2 can offer. tsearch2 is also faster when you need to combine
FT conditions with other conditions. Since tsearch2 is built on a gist
index, you can build multiple column gist indexes including the
tsvector column.
tsearch2 “problem” is that it is Postgres specific. Not a problem for
me as I use ONLY Postgres. But if the app is MySQL based you are out
of luck.
But tsearch2 has a huge advantage over Ferret - it has lexical
capabilities. This meand that searching for plurals or otherwise
modified workd, will find them and the related words. FOr example,
searching for rabit or searching for rabbits, will both bring all docs
with rabbit or rabbits. This is a VERY cool feature, and it’s language
specific. There are dictionaries for stop-words for many many
languages.
i use postgres too, so is better use tsearch2 than ferret? (with a
really big database)
tsearch2 “problem” is that it is Postgres specific. Not a problem for
me as I use ONLY Postgres. But if the app is MySQL based you are out
of luck.
But tsearch2 has a huge advantage over Ferret - it has lexical
capabilities. This meand that searching for plurals or otherwise
modified workd, will find them and the related words. FOr example,
searching for rabit or searching for rabbits, will both bring all docs
with rabbit or rabbits. This is a VERY cool feature, and it’s language
specific. There are dictionaries for stop-words for many many
languages.
will these two handle normalized VS non-normalized
data. Like if I am using relations with other objects like has_many etc will
these search plugins index the items owned by
my model as well.
Here’s a couple i thought were helpful, touch on the major issues:
sorting vs searching, stemming/tokenization, stopwords, UTF-8 (i
think), tf/idf calculations