On 28.08.2008, at 17:17, Erik H. wrote:
If you’re talking about custom analyzers being in Ruby, more on that
below.
It’s not only custom analyzers, but the fact that acts_as_ferret’s DRb
runs with the full Rails application loaded, so i.e. to bulk index a
number of records aaf just hands the server the ids and class name of
the records to index, and the server does the rest. It’s debatable if
one approach is better than the other, in terms of index server load
it might even be better to do as much as possible on the client side,
but still it’s a much tighter coupling than you get with the
application agnostic interfaces of solr or stellr.
I must admit that I have a hard time to come up with another example
besides my synonym/thesaurus analysis stuff where this might useful,
but I think there are more use cases where such a tight integration
might come in handy.
It’s an independent server indexing whatever you throw over the
fence via http+xml.
Solr can index CSV as well now a relational database directly (with
the new DataImportHandler).
It also responds with Ruby hash structure (just add &wt=ruby to the
URLs, or use solr-ruby which does that automatically and hides all
server communication from you anyway).
Yeah, I know, but anyway there is a strict line between your
application and Solr, which doesn’t know a thing about the application
using it.
How to use a custom analyzer with solr? You have to code it in Java
(or you do your analysis before feeding the data into java land,
which I wouldn’t consider good app design).
Most users would not need to write a custom analyzer. Many of the
built-in ones are quite configurable. Yes, Solr does require schema
configuration via an XML file, but there have been acts_as_solr
variants (good and bad thing about this git craze) that generate
that for you automatically from an AR model.
Glad you mentioned this I don’t want to configure an analyzer via
xml when I can throw my own together with 4 or 5 lines of easy to read
ruby code. Same for index structure. Philosophical mismatch between
the Java and Ruby worlds I think
But even if you do that then you have
a) half a java project (I don’t want that)
That’s totally fair, and really the primary compelling reason for a
Ferret over Solr for pure Ruby/Rails projects. I dig that.
But isn’t Ferret is like 60k lines of C code too?!
true, but I don’t have to compile that every time I deploy my app…
and b) no way to use your existing rails classes in that custom
analyzer (I have analyzers using rails models to retrieve
synonyms and narrower terms for thesaurus based query expansion)
You could leverage client-side query expansion with Solr… just
take the users query, massage it, and send whatever query you like
to Solr. Solr also has synonym and stop word capability too.
yeah, I could do that. But that’s moving analysis stuff into my
application, which is quite contrary to the purpose of analyzers -
encapsulate this logic and make it pluggable into the search engine
library. So less style points for this solution…
However, there is also no reason (and I have this on my copious-free-
time-TOOD-list) that JRuby couldn’t be used behind the scenes of a
Solr analyzer/tokenizer/filter or even request handler… and do all
the cool Ruby stuff you like right there. Heck, you could even send
the Ruby code over to Solr to execute there if you like
that sounds sexy
Just using Solr and fixing up acts_as_solr to meet your needs (if it
doesn’t) would be even easier than all that Solr really is a
better starting point than Lucene directly, for caching,
scalability, replication, faceting, etc.
Depends on whether you need these features or not. From my experience,
lots of projects don’t need these things anyway, because they’re
running on a single host and nearly every other part of the
application is slower than search… Maybe it’s because I’m quite
involved with the topic and am familiar with lucene’s API, but to me
Solr looks like an additional layer of abstraction and complexity
which I only want to have when it really gives me a feature I need.
Plus the last time I checked Lucene didn’t need xml configuration
files
In development environments and especially when it comes to automated
tests / CI it’s also quite comfortable not having to run a separate
server but using the short cut directly to the index, which isn’t
possible with Solr.
I’d be curious to see scalability comparisons between Ferret and
Solr - or perhaps more properly between Stellr and Solr - as it
boils down to number of documents, queries per second, and faceting
and highlighting speed. I’m betting on Solr myself (by being so
into it and basing my professional life on it).
This would be interesting, but I wouldn’t be that disappointed with
Stellr ending up second given the little amount of time I’ve spent
building it so far. Just out of curiosity, do you have some kind of
performance testing suite for Solr which I could throw at Stellr?
Cheers,
Jens
–
Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database