I’ve been running some multithreaded tests on Ferret. Using a single
Ferret::Index::Index inside a DRb server, it definitely behaves for me
as if all readers are locked out of the index when writing is going on
in that index, not just optimization – at least when segment merging
happens, which is when the writes take the longest and you can
therefore least afford to lock out all reads. This is very easy to
notice when you add, say, your 100,000th document to the index, and
that one write takes over 5 seconds to complete because it triggers a
bunch of incremental segment-merging, and all queries to the index
stall in the meantime. Or when you add your millionth document, which
can stall all reads for over a minute.
When I try to use an IndexReader in a separate process, things are
even worse. The IndexReader doesn’t see any updates to the index
since it was created. Not too surprising, but if I try creating a new
IndexReader for every query, and have the Index in the other writing
process turn on auto_flush, then the reading process crashes after a
few (generally fewer than 100) queries, in one of at least two
different ways selected apparently at random:
Failure Mode #1:
script/ferret_speedtest2_reader:30:in `initialize’: IO Error occured
at <except.c>:93 in xraise (IOError)
Error occured in index.c:901 - sis_find_segments_file
Error reading the segment infos. Store listing was
from script/ferret_speedtest2_reader:30:in new' from script/ferret_speedtest2_reader:30:in
run_test_query’
[Yes, there really are two blank lines after “Store listing was”.]
Failure Mode #2:
script/ferret_speedtest2_reader:30:in `initialize’: IO Error occured
at <except.c>:93 in xraise (IOError)
Error occured in fs_store.c:127 - fs_each
doing ‘each’ in
/Users/scott/dev/ruby/timetracker/tmp/ferret_speedtest_index:
from script/ferret_speedtest2_reader:30:in new' from script/ferret_speedtest2_reader:30:in
run_test_query’
Meanwhile, if I try eliminating this second failure mode by explicitly
calling close on the IndexReader
before I throw it away, the close immediately crashes with:
script/ferret_speedtest2_reader:45: [BUG] Bus Error
ruby 1.8.6 (2007-03-13) [i686-darwin8.8.5]
Abort trap
Given the combination of problems above, I’m at a loss to understand
how to use Ferret on a live website that requires reasonably fast
turnaround between a user submitting data and the user being able to
search over that data, unless either (1) the site only gets a few
thousand new index entries per day and the site can be taken down for
a few minutes daily to optimize the index, or (2) it’s OK for the
entire site to periodically stall on all queries for seconds or even
minutes whenever segment-merging happens to kick in.
Do all Ferret users just suck it up and live with one of these
limitations, or am I missing something and/or just getting “lucky”
with the errors above?
For reference, the system being used here is a Mac running Leopard,
although I doubt that matters…