Is it possible to use Ferret to do substring searches efficiently? If
not, what can I use?
Problem: 1 million+ strings need to be matched with 1 million+
substrings. For example:
iliketotraveltohawaii
travelmagazine
will both be matched with the substring “travel” but only the first will
match with “hawaii”.
What I have tried:
Used Ferret to create an index with a WhiteSpaceAnalyzer by splitting
each string into characters. travelmagazine -> t r a v e l m a g a z i n
e
This works, and generates the index very quickly but the search
(PhraseQuery) is very slow. Like 200-300 ms.
I’m concerned that either Ferret is the wrong tool for this or I’m just
taking the wrong approach to this problem.