Hi everyone,
I’ve been exploring using Ferret to index and search production log
files. The initial approach I was taking was to add each line of each
log file to the index as a document, only indexing the “content” part of
the log file and not storing it. I also stored the original file name
and line number so that after searching I could go load the content from
the original log files (hoping to keep my index size down).
Right now, I’ve run it on a sample set of 144mb of log files with an
index size of 70ish MB and an index time of roughly 1sec/mb.
The problem with this approach is that we have a LOT of logs, 30 days is
roughly 800Gb. So I’ll probably just index incrementally every few
hours.
My question is, given that I want to be able to find lines in multiple
log files that match a query (based on string in the “content”, or date
or log level, which I can parse out of each line), what is the best
approach to this? Is Ferret even meant for something like this?
Thanks,
Chris