Ferret's Memory usage when searching

EfrSSSSn_DSSS_az · June 4, 2008, 9:58pm

Hello,

We are experiencing some performance problems when using Ferret and we
are trying to isolate the problem.

We have about 80 GB in Indexes for one of our clients and when a
search is performed on those indexes the application gets really slow
and eventually it stops responding. We’ve been monitoring the memory
usage, and it rises very rapidly as the indexes are been loaded.

Ferret’s documentation says the index reader is automatically closed
during garbage collection, but either this doesn’t work, or it takes
much longer to happen than would be ideal for us.

So we are running out of memory and the mongrel instances become
unresponsive to a point that not even monit can restart them, we have
to kill the instances manually.

Does anyone knows how Ferret manages it’s memory usage, does it try to
load all the indexes needed for a search into RAM all at once?

If that’s the case, what happens when the indexes size exceeds the
available RAM?

Does anyone have this problem before?

The help anyone can provide will be greatly appreciated.

EfrSSSSn_DSSS_az · June 5, 2008, 12:34pm

Hi!

first of all, among how many indexes are those 80GB distributed? and of
what size is the largest index? Anyway this sounds like a really huge
setup

On Wed, Jun 04, 2008 at 03:58:20PM -0400, Efrén Díaz wrote:

Ferret’s documentation says the index reader is automatically closed
during garbage collection, but either this doesn’t work, or it takes
much longer to happen than would be ideal for us.

did you try to manually close the readers instead of waiting for the GC
to do it?

So we are running out of memory and the mongrel instances become
unresponsive to a point that not even monit can restart them, we have
to kill the instances manually.

Probably not a goot idea to open the readers directly inside the
mongrels, since this of course will multiply the maximum memory needed
by the number of mongrel instances running.

Does anyone knows how Ferret manages it’s memory usage, does it try to
load all the indexes needed for a search into RAM all at once?

It doesn’t try to load the whole index into RAM, but for sure a reader
keeps some data structures in memory to speed up searching. That would
be another benefit of having a separate (multithreaded) server handling
the search - readers are opened once on startup and kept open all the
time. Or at least until you re-open them to reflect any index changes.

If that’s the case, what happens when the indexes size exceeds the
available RAM?

It’s definitely possible to search an index larger than the available
RAM. However I don’t know a way to estimate the amount of RAM needed for
searching an index of a given size. This also depends on your usage
pattern and index contents (i.e., number of terms and documents) I’d
say.

Does anyone have this problem before?

no, but I don’t have indexes that large, either.

Cheers,
Jens

–
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold