Road map of ferret

Hi all,

I’m new on the list, and glad to participate.
I would like to make some questions about the ferret project…

Please don’t take this questions as offensive, I really like to know
about
how ferret is reliable for a long life product.
Here on my company we are planning to make a big product with a indexing
engine, I would like to know if the ferret is “alive”.
Thanks for the answers!


Atenciosamente - Best regards,

Fernando Luiz Parisotto

I Think many people here have the same questions.

Fernando Parisotto wrote:

I have the same questions about ferret.

I’ve been using Ferret in a project still under development, and it
works pretty well. As far as I can tell, the project is dying, if not
already dead. David B. is still the only listed developer, and
he seems to have moved on to other things. However, since the
software is still meeting my project’s needs, I am not terribly
bothered by that. I suppose that eventually (in a few years?)
something will change enough that Ferret will stop working, and then
we’ll have to find something else.

If you can find an alternative that has active development, I would
recommend you go with that. (And if you find one, please post about
it.) But, if you can’t, Ferret will probably be good enough for a
while.

On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
[email protected] wrote:


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Paul L.
Aquilent, Inc.
National Library of Medicine (Contractor)

On Aug 27, 2008, at 8:20 AM, Eric S. wrote:

If anyone knows of any ruby IR projects which are mature, and are
being actively developed I would love to hear about them.

FWIW, I recently finished porting all module code in KinoSearch to C.
If we write binding code and port the test suite, it will be usable
from Ruby.

KinoSearch is sort of a sister project to Ferret. The dev branch
implements many of the ideas that Dave Balmain and I designed together
for the Lucy project.

Marvin H.
Rectangular Research
http://www.rectangular.com/

I would also be interested in Ferret alternatives for IR in ruby, a
simple search on rubyforge returned mainly a bunch of projects that
look to be abandoned…

  • Rise (does not appear to be actively developed)
  • rubylucene (looks to be a dead project)
  • Ruby Simple Indexer (also looks dead)
  • Ruby Odeum (simple ruby-bindings for a fast inverted index)

If anyone knows of any ruby IR projects which are mature, and are
being actively developed I would love to hear about them.

Thanks – Eric

On Wednesday, August 27, at 10:29, Paul L. wrote:

I’ve been using Ferret in a project still under development, and it
works pretty well. As far as I can tell, the project is dying, if
not
already dead. David B. is still the only listed developer, and
he seems to have moved on to other things. However, since the
software is still meeting my project’s needs, I am not terribly
bothered by that. I suppose that eventually (in a few years?)
something will change enough that Ferret will stop working, and then
we’ll have to find something else.

If you can find an alternative that has active development, I would
recommend you go with that. (And if you find one, please post about
it.) But, if you can’t, Ferret will probably be good enough for a
while.

On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
[email protected] wrote:

Hi all,

I’m new on the list, and glad to participate.
I would like to make some questions about the ferret project…

Please don’t take this questions as offensive, I really like to
know about
how ferret is reliable for a long life product.
Here on my company we are planning to make a big product with a
indexing
engine, I would like to know if the ferret is “alive”.
Thanks for the answers!


Atenciosamente - Best regards,

Fernando Luiz Parisotto


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Paul L.
Aquilent, Inc.
National Library of Medicine (Contractor)


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reformatted excerpts from Eric S.'s message of 2008-08-27:

If anyone knows of any ruby IR projects which are mature, and are
being actively developed I would love to hear about them.

Much less useable API than Ferret, and you have to run it as a separate
server process, but it’s fast, stable, and actively maintained.

As far as I know, Sphinx only can only index tables that have a unique
numeric id (e.g. and auto-incrementing int)… I looked at using it,
but
we use md5 hashes for the id/primary key on the tables I want to
index… so
we were out of luck.
For what it’s worth, I use Ferret 0.11.6 and love it. I re-index about
~90
million rows (and growing) worth of “stuff” (title, description, author,
etc…) every night… works like a champ. Searching is fast (provided
you
don’t want to sort on something other than relevance) and accurate.

How bout Sphinx?

Thanks for all the info, I just found a very good related discussion
from ruby-forum which I thought I’d share

http://www.ruby-forum.com/topic/137629

On Wednesday, August 27, at 11:57, arvind gautam wrote:

How bout Sphinx?

On Wed, Aug 27, 2008 at 11:20 AM, Eric S.
[email protected]wrote:

I would also be interested in Ferret alternatives for IR in ruby, a
simple search on rubyforge returned mainly a bunch of projects that
look to be abandoned…

  • Rise (does not appear to be actively developed)
  • rubylucene (looks to be a dead project)
  • Ruby Simple Indexer (also looks dead)
  • Ruby Odeum (simple ruby-bindings for a fast inverted index)

If anyone knows of any ruby IR projects which are mature, and are
being actively developed I would love to hear about them.

Thanks – Eric

On Wednesday, August 27, at 10:29, Paul L. wrote:

I’ve been using Ferret in a project still under development, and
it
works pretty well. As far as I can tell, the project is dying,
if not
already dead. David B. is still the only listed developer,
and
he seems to have moved on to other things. However, since the
software is still meeting my project’s needs, I am not terribly
bothered by that. I suppose that eventually (in a few years?)
something will change enough that Ferret will stop working, and
then
we’ll have to find something else.

If you can find an alternative that has active development, I
would
recommend you go with that. (And if you find one, please post
about
it.) But, if you can’t, Ferret will probably be good enough for
a
while.

On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
[email protected] wrote:

Hi all,

I’m new on the list, and glad to participate.
I would like to make some questions about the ferret
project…

Please don’t take this questions as offensive, I really like
to know
about
how ferret is reliable for a long life product.
Here on my company we are planning to make a big product with
a
indexing
engine, I would like to know if the ferret is “alive”.
Thanks for the answers!


Atenciosamente - Best regards,

Fernando Luiz Parisotto


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Paul L.
Aquilent, Inc.
National Library of Medicine (Contractor)


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


schulte


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

On Aug 27, 2008, at 11:36 AM, Eric S. wrote:

What is the status of the Lucy project?

The dev branch of KinoSearch is basically Lucy. When Dave became
unavailable, I didn’t really have anyone else to bounce ideas off of
for Lucy (since it was a from-scratch project without a community), so
I returned to the established KS community – but took the code base
in the direction that Dave and I had worked out.

My current plan is to make an official KinoSearch release for Perl,
write some experimental bindings for other languages, achieve
stability, then make KinoSearch the “maint” branch and Lucy the “dev”
branch.

Also, I may be missing something obvious here, but I don’t understand
why there is no ruby API directly to the Lucene Java library,

If you want to use Lucene, just go with Solr.

Marvin H.
Rectangular Research
http://www.rectangular.com/

On Wed, 2008-08-27 at 08:34 -0700, Marvin H. wrote:

FWIW, I recently finished porting all module code in KinoSearch to C.
If we write binding code and port the test suite, it will be usable
from Ruby.

KinoSearch is sort of a sister project to Ferret. The dev branch
implements many of the ideas that Dave Balmain and I designed together
for the Lucy project.

Hi Marvin,

In my experience the Ruby community is crying out for a “drop-in”
replacement for Ferret. Sphinx is great, but different. Xapian looks
good but doesn’t have the Ruby maturity of Ferret yet (especially
considering acts_as_ferret). I keep coming across people using Ferret
successfully but have little niggles here and there.

Is KinoSearch something that could be a Ferret replacement? Or the
foundations of a Ferret replacement? What are the differences between
it and Ferret?

Out of interest, what are the differences between it and the planned
Lucy project (would be good to hear more about what your plans were for
Lucy. Maybe it’ll inspire somebody else?)

Do you happen to know if Dave is likely to work on Ferret again someday?
I think we’ve seen some commits from him recentlyish but no word I’ve
seen. Hope all is well.

Thanks,

John.

http://johnleach.co.uk

On Wednesday, August 27, at 08:34, Marvin H. wrote:

KinoSearch is sort of a sister project to Ferret. The dev branch
implements many of the ideas that Dave Balmain and I designed
together
for the Lucy project.

What is the status of the Lucy project? A ruby api into the venerable
library Lucene seems to be the obvious first step towards developing a
truly stable effective IR solution for Ruby. The last update on the
Lucy webpage Apache Lucy seems to be from 2006.

Also, I may be missing something obvious here, but I don’t understand
why there is no ruby API directly to the Lucene Java library, why
would the only Lucene/Ruby API be to the C-port of lucene?

Much Thanks – Eric

Hi,

Le 28 août 08 à 12:11, John L. a écrit :

In my experience the Ruby community is crying out for a “drop-in”
replacement for Ferret. Sphinx is great, but different. Xapian looks
good but doesn’t have the Ruby maturity of Ferret yet (especially
considering acts_as_ferret). I keep coming across people using Ferret
successfully but have little niggles here and there.

The best would probably be to have some of us dig into ferret and
help to fix the remeaining bugs!

I’d like to experiment with beanstalkd
http://xph.us/software/beanstalkd/
which - I’ve been told - is a better alternative to Drb for
background indexing.

Still using ferret on many websites, and it’s so simple to use, why
use something else ?

On Aug 27, 2008, at 11:20 AM, Eric S. wrote:

If anyone knows of any ruby IR projects which are mature, and are
being actively developed I would love to hear about them.

disclaimer: highly opinionated response follows… :slight_smile:

Solr is the way to go for Ruby projects*. solr-ruby, if I do say so
myself, ain’t half bad. It’s downright beautiful to interact with
Solr via Ruby: http://wiki.apache.org/solr/solr-ruby. I have plenty
of wishes for where solr-ruby could still evolve, so it’s not done
yet. * pragmatically I realize that another moving piece, especially
a JVM, isn’t a good fit for many current production deployment
environments. See below for my answer to that…

Ferret is awesome, let me be clear about that! I have always loved
it’s power, even beyond Lucene Java in some cases. But I’ve stuck
with Lucene through the tough times and it’s always been good to me.
Solr’s goodness on top of Lucene Java make it extremely compelling for
every environment, be it Ruby, Python, Java itself, what have you.
I’ve always been fonder of the JVM than native C stuff, and when
Ferret went that direction I stuck with Java.

acts_as_solr, however, hasn’t yet reached its potential - and my
little hack that kick started it wasn’t really beneficial to the
community, my apologies - since I basically “abandoned” it. But it
ain’t half bad either thanks to Thiago’s hard work, and does make cake
work out of RDBMS ↔ Solr, whereas it takes something this ugly to do
it in Java: http://wiki.apache.org/solr/DataImportHandler (oh Ruby
how I love you!).

Solr is incredibly powerful, beyond the features I think almost all of
the other open source search engines offer. It’s scalability evolves
almost daily, as does the pluggability capabilities of it.

And for those JRuby folks out there… well, I guess there aren’t
(m)any of those on the ferret list, but think about the
possibilities… SolrJRuby! Wow.

Erik

On Aug 27, 2008, at 3:28 PM, Marvin H. wrote:

On Aug 27, 2008, at 11:36 AM, Eric S. wrote:

Also, I may be missing something obvious here, but I don’t understand
why there is no ruby API directly to the Lucene Java library,

Mainly because Ruby has been too slow to have something pure. Ferret
is about as close as it gets to Lucene Java compatibility, and really
only diverged from the file format because of wise practical reasons.

If you want to use Lucene, just go with Solr.

+1

Solr is great in Ruby environments to. Really it is. Sure, there’s
this JVM beast, and deployment issues, and all that, but they
generally aren’t that painful. And the benefits are totally worth it.

Erik

On Aug 28, 2008, at 9:52 AM, Jens K. wrote:

So here’s my very own biased opinion just to complete the picture :slight_smile:

Hey, software should be opinionated! That’s totally fair :slight_smile:

(shameless plug: selfhtml.org search will be powered by Stellr
[1] ;-).

Stellr - great name. Interesting… that’s pretty sweet.

Solr, while being an interesting project without doubt, won’t ever
reach the level of Rails integration that’s possible with
acts_as_ferret, simply because it’s server doesn’t run in the
context of the rails app with model classes and all that stuff.

What advantage does Ferret have in terms of ActiveRecord integration
that Solr wouldn’t have?

If you’re talking about custom analyzers being in Ruby, more on that
below.

It’s an independent server indexing whatever you throw over the
fence via http+xml.

Solr can index CSV as well now a relational database directly (with
the new DataImportHandler).

It also responds with Ruby hash structure (just add &wt=ruby to the
URLs, or use solr-ruby which does that automatically and hides all
server communication from you anyway).

How to use a custom analyzer with solr? You have to code it in Java
(or you do your analysis before feeding the data into java land,
which I wouldn’t consider good app design).

Most users would not need to write a custom analyzer. Many of the
built-in ones are quite configurable. Yes, Solr does require schema
configuration via an XML file, but there have been acts_as_solr
variants (good and bad thing about this git craze) that generate that
for you automatically from an AR model.

But even if you do that then you have
a) half a java project (I don’t want that)

That’s totally fair, and really the primary compelling reason for a
Ferret over Solr for pure Ruby/Rails projects. I dig that.

But isn’t Ferret is like 60k lines of C code too?!

and b) no way to use your existing rails classes in that custom
analyzer (I have analyzers using rails models to retrieve synonyms
and narrower terms for thesaurus based query expansion)

You could leverage client-side query expansion with Solr… just take
the users query, massage it, and send whatever query you like to
Solr. Solr also has synonym and stop word capability too.

However, there is also no reason (and I have this on my copious-free-
time-TOOD-list) that JRuby couldn’t be used behind the scenes of a
Solr analyzer/tokenizer/filter or even request handler… and do all
the cool Ruby stuff you like right there. Heck, you could even send
the Ruby code over to Solr to execute there if you like :wink:

Here’s what I would do if I experienced severe problems with
Ferret in any of my projects:

Take aaf, replace Ferret with Lucene or even make it modular to
decide at run time which one to use, run the DRb server (or the
whole app, that depends) under JRuby and call it acts_as_lucene :slight_smile:
Et voila - great Rails integration plus Lucene’s maturity. But as
long as Ferret’s working fine for me that’s really unlikely to
happen… Unless somebody wants to sponsor that project, of course :wink:

Just using Solr and fixing up acts_as_solr to meet your needs (if it
doesn’t) would be even easier than all that :slight_smile: Solr really is a
better starting point than Lucene directly, for caching, scalability,
replication, faceting, etc.

I’d be curious to see scalability comparisons between Ferret and Solr

  • or perhaps more properly between Stellr and Solr - as it boils down
    to number of documents, queries per second, and faceting and
    highlighting speed. I’m betting on Solr myself (by being so into it
    and basing my professional life on it).

    Erik

Hi!

On 27.08.2008, at 20:20, Eric S. wrote:

Thanks for all the info, I just found a very good related discussion
from ruby-forum which I thought I’d share

http://www.ruby-forum.com/topic/137629

well, in this discussion there’s (besides some useful information)
some pretty biased statements from several people who obviously must
have had a frustrating time with Ferret, or just didn’t get it working
right out of the box and decided it was cheaper to make their clients
switch search technology (and possibly losing features) than to fix
their deployment. I never had somebody from engine yard contact me
regarding their massive ferret deployment problems, not sure how hard
they really tried to get over them.

Imho it’s not very likely that it’s Ferret’s fault that, while all
around the world people are running ferret based apps fine, every
client of engine yard experiences the same set of problems…

So here’s my very own biased opinion just to complete the picture :slight_smile:

I use Ferret in several productive projects with several customers,
and also choose it for new projects like the soon-to-be-released new
full text search for the german selfhtml.org portal or the search
feature at www.fahrrad-xxl.de, which tightly integrates aaf with rdig
(shameless plug: selfhtml.org search will be powered by Stellr [1] ;-).

I have absolutely no problem with Ferret not being very actively
maintained, because it works for me just like it is. Honestly, I
never had ferret segfault in any one one of my own production apps.
(But I admit I saw it segfault in other places, maybe I just don’t do
the right things to make it crash…)

So why do I stick to Ferret while others declare it a ‘dead’ project?
Ferret’s flexibility and feature set plus the level of Rails
integration it offers by means of aaf is very unlikely to be reached
by any other combination of search engine lib + Rails plugin in the
near future.
Having that said, I’m really interested how the KinoSearch/Lucy stuff
will go on…

Solr, while being an interesting project without doubt, won’t ever
reach the level of Rails integration that’s possible with
acts_as_ferret, simply because it’s server doesn’t run in the context
of the rails app with model classes and all that stuff. It’s an
independent server indexing whatever you throw over the fence via http
+xml. That framework independence is a great plus under some
circumstances (and my Stellr project scratches exactly that itch in a
much more lightweight and undoubtedly less scalable manner), but
sometimes it’s also a bad thing.

How to use a custom analyzer with solr? You have to code it in Java
(or you do your analysis before feeding the data into java land, which
I wouldn’t consider good app design). But even if you do that then you
have
a) half a java project (I don’t want that)
and b) no way to use your existing rails classes in that custom
analyzer (I have analyzers using rails models to retrieve synonyms
and narrower terms for thesaurus based query expansion)

Not to speak of Sphinx here, which offers even less integration with
your Rails application because it’s tied directly to the database and
doesn’t support stuff like real incremental indexing. It’s easy to be
several times faster when you leave out most of the features…

Of course there are lots of use cases where Sphinx or Solr are
perfectly valid choices, because their feature set suits the
requirements and/or you’re comfortable with running a servlet
container in your production env and spreading your application logic
across several languages.

Here’s what I would do if I experienced severe problems with Ferret
in any of my projects:

Take aaf, replace Ferret with Lucene or even make it modular to decide
at run time which one to use, run the DRb server (or the whole app,
that depends) under JRuby and call it acts_as_lucene :slight_smile:
Et voila - great Rails integration plus Lucene’s maturity. But as long
as Ferret’s working fine for me that’s really unlikely to happen…
Unless somebody wants to sponsor that project, of course :wink:

Cheers,
Jens

[1] http://rubyforge.org/projects/stellr


Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

On Aug 28, 2008, at 3:11 AM, John L. wrote:

Is KinoSearch something that could be a Ferret replacement?

Yes. The projects are roughly comparable.

I’d be happier if Ferret’s ultimate successor was named “Lucy”,
though, because then more credit would flow to Dave.

What are the differences between it and Ferret?

From a high level, they’re pretty similar. Analyzer, QueryParser,
IndexReader, and all that.

There are superficial differences in the implementations of individual
classes. For instance, Ferret provides several different Tokenizer
classes; KinoSearch provides one, based on a regex pattern matching
one token.

 # KinoSearch version of WhiteSpaceTokenizer
 tokenizer = Tokenizer.new(:pattern => "\\S+")

At a low level, things start to diverge. For instance, all metadata
in the KinoSearch index file format is encoded as JSON, so it’s human-
readable for easy spelunking and debugging. Also, it’s easier to
override methods in KinoSearch, so you can do things like implement
SearchServer/SearchClient or MockScorer or KSx::Highlight::Summarizer
in pure Perl; I believe the mechanism will work similarly with Ruby
bindings.

what are the differences between it and the planned Lucy project

Personally, I think of them as the same project. KinoSearch is at
version 0.x and will soon become version 1.0. Lucy will be version 2
– KinoSearch’s successor.

Lucy has never had a high-level API – the work Dave and I did was all
on the low-level core. That core has now been fully implemented in
the KinoSearch dev branch.

What happens between version 1 and 2 depends on how the rollout of
version 1 goes.

Do you happen to know if Dave is likely to work on Ferret again
someday?

I know he would like to. However, I hope to persuade him to return to
his work on Lucy. :slight_smile:

Marvin H.
Rectangular Research
http://www.rectangular.com/

Hi!

On 28.08.2008, at 18:24, Marvin H. wrote:
[…]

There are superficial differences in the implementations of
individual classes. For instance, Ferret provides several different
Tokenizer classes; KinoSearch provides one, based on a regex pattern
matching one token.

KinoSearch version of WhiteSpaceTokenizer

tokenizer = Tokenizer.new(:pattern => “\S+”)

That’s pretty simple :wink: With Ferret I can use custom tokenizers to
inject additional terms at the same offset (i.e., synonyms), is there
another way to achieve that with KinoSearch?

[…]

Do you happen to know if Dave is likely to work on Ferret again
someday?

I know he would like to. However, I hope to persuade him to return
to his work on Lucy. :slight_smile:

whatever, as long as it’s as powerful and easy to use as Ferret and
has ruby bindings I’m all for it :slight_smile:

Cheers,
Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49351467660 | Telefax +493514676666
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold