I’ve got a smallish site with not a ton of data at the moment… but
all that could change at some point so I’d like to plan with that in
mind. Currently I’m deployed on an nginx/mongrel stack that works
quite well. My site uses Ferret for search and it’s ok… the big
problem is that some terms don’t show up as expected… especially if
there are apostrophes, plurals, etc involved.
I’ve got two choices that I see… pony up the O’reilly mini-pdf and
tweak ferret settings or scrap ferret and go with Sphinx (and hope it
handles cases like this better). I’m not sure how much time the
latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?
handles cases like this better). I’m not sure how much time the
latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?
We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret.
Robby
–
Robby R.
Founder and Executive Director
PLANET ARGON, LLC
Design, Development, and Hosting with Ruby on Rails
latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?
We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret.
Can you elaborate on why? I’m mostly just curious
To the parent…
the ferret PDF booklet is pretty full of good information
if you stick with ferret. I don’t however remember if it discusses how
to
handle words with apostrophes in it. It does talk about how to hand
plurals via the StemFilter though.
the ferret PDF booklet is pretty full of good information
if you stick with ferret. I don’t however remember if it discusses
how to
handle words with apostrophes in it. It does talk about how to hand
plurals via the StemFilter though.
Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.
Plus sphinx can reindex many many times faster then ferret and uses
less cpu and memory as well.
A decent search option is Lucene via acts_as_solr plugin.
I never used Sphynx though. Can anyone with firsthand experience of
both Lucene and Sphynx give their opinion?
Ferret has been very unstable for us. It is unfortunate because it seems
like it would be more customizable than Sphinx. But I must admit that I
like
that Sphinx can take the data by itself from MySQL and index it really
fast.
AEM
A decent search option is Lucene via acts_as_solr plugin.
I never used Sphynx though. Can anyone with firsthand experience of
both Lucene and Sphynx give their opinion?
–
Alexey V.
We have a bunch of clients using solr as well. In general it is more
powerful then sphinx but a lot slower to reindex and querey. Also it
uses 50 times the memory of sphinx. If you have a box or vm to put
SOLR on by itself then it is a good option as well. but if sphinx can
do everything you need from a a search indexer then it is a way better
option cost wise.
handles cases like this better). I’m not sure how much time the
Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.
Huh. I must be lucky. Or not have that much to index (true) or users
don’t complain about not finding anything (probably very true)
I’ll have t ogive sphinx a go next time around… thanks ezra
Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.
Just out of interest, were corrupted indexes seen even with only one
process writing to the index (via DRb as is recommended)? Multiple
writers are unsupported and cause these kinds of problems.
Segfaults were quite common in older version too, but it’s settled down
now and I’ve had it rather stable in a few small production sites
(though I’m not talking Twitter-like load :).
writers are unsupported and cause these kinds of problems.
Segfaults were quite common in older version too, but it’s settled
down
now and I’ve had it rather stable in a few small production sites
(though I’m not talking Twitter-like load :).
Yes we have tried every way possible of running ferret, by itself,
drb server etc. I really like ferrets interface and integration with
rails but unfortunately it causes nothing but problems for so many
people that I cannot recommend it with a straight face. Not meaning to
bash on the ferret devs here at all, just stating what I’ve seen
across hundreds of deployments.
On Fri, 2008-01-04 at 11:26 -0500, Vince W. wrote:
latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?
Hi Vince,
They’re different tools really. I’ve found the flexibility of Ferret to
be really quite awesome. I can (in Ruby):
set boost values independently per field and per record
write custom text tokenizers, stemmers and stop lists (and use
different ones per field even)
highlight matches in results using the same engine that does the
searching
manage my own indexes, merging them at will, or just merging results
from them.
Index content generated on the fly, without having to store it in my
sql database (pull in all the associated tags for a post as you index it
for example).
Store original data in the index (though most people use it to index
an SQL database anyway).
other awesome stuff I can’t remember right now.
Looking at the documentation for Sphinx (and it’s usual usage, with
MySQL), many (if not all) of those features are missing. But Sphinx is
reportedly quicker, supports distributed searching, and appears to be
undergoing more development that Ferret is at the moment so I think it
depends on your needs.
I’d recommend you ask on the Ferret mailing list about your search
result issues though - I’m surprised you’re having problems with that.
I’m sure it can be solved.
A decent search option is Lucene via acts_as_solr plugin.
I never used Sphynx though. Can anyone with firsthand experience of
both Lucene and Sphynx give their opinion?
…
We have a bunch of clients using solr as well. In general it is more
powerful then sphinx but a lot slower to reindex and querey. Also it
uses 50 times the memory of sphinx. If you have a box or vm to put
SOLR on by itself then it is a good option as well. but if sphinx can
do everything you need from a a search indexer then it is a way better
option cost wise.
I don’t have first hand experiences with sphinx, but i can confirm
that given a decent hw setup solr (with acts_as_solr) is really good
(not only in terms of performance but also of flexibility, and
functionality). We used it for miojob.it and it powers almost any
aspect of that site, which is built around faceted browsing of job
postings and has a only a few spots where caching was appropriate
without sweating under a traffic which is in the multi hundred K hits
per day (i don’t have the real numbers)
Anyhow given the lower system requirements, I’d like to give a try to
sphinx to see what can it do!
I’ve been humming and hawing all weekend about whether or not to put
in the time to use Sphinx, and I guess the mountain of evidence is
clear: I’ll be moving my project over to Sphinx today.
I’ve been using Ferret since it’s beginning, I’m also the french
translator
of the Ferret Shortcut’s for O’Reilly, and i can tell one thing: Don’t
use Ferret.
It’s really unstable and the development has stopped a while ago…
That’s
really sad because it was really an AWESOME product but it never
reached
a stable state.
I’ve experienced also huge problems with act_as_solr, so finally i’d
just
say “use Sphinx”. That’s for me the safier decision.
Ya we use ferret right now on our site. It’s ok, but it does segfault
about once a week. It’s not a huge deal I suppose, but doesn’t make me
feel good. Right now I’m evaluating switching to solr or sphinx. It
would be nice to have the ‘more like this’ ability that AAF/Ferret has.
I didn’t really see this feature with sphinx. We would also like to be
able to write a custom sort method, which I haven’t been able to do with
ferret. I see there’s an ability to do that with sphinx which looks
nice.
Anyways, can anyone recommend a sphinx plugin for Rails?
There’s 3 so far that I found. acts_as_sphinx, ultrasphinx, and
sphinctor. Are they all actively updated?
I’m not sure about acts_as_sphinx and sphinctor being actively
updated, but I can confirm that both Ultrasphinx and Thinking Sphinx
(my own plugin - http://ts.freelancing-gods.com) are regularly updated
and under the hood they both use the same Ruby Sphinx client -
Riddle (http://riddle.freelancing-gods.com - again, mine - sorry for
blowing my own trumpet), which I’ve been keeping up to date to match
the recent releases of Sphinx.
Evan’s and my plugins do a lot of the same things, just different
approaches, so, with as little bias as possible, I think either can do
the job for you. I can’t speak for the other two plugins though, as
it’s been so long since I’ve looked into them.
…
How difficult would it be to change over to Sphinx?
That would really depend on how you hooking up with Ferret and if you
were
using any advanced features. My guess is that it shouldn’t be too hard
to
switch.
I’ve been playing with Ferret for awhile. I actually get corrupted
indexes just running in development. I’m close to deploying an app
that uses ferret and some of the things I’ve heard really worry me.
Haven’t had a chance to test the drb server though, but the whole idea
of that bothers me too.
How difficult would it be to change over to Sphinx?