Problem getting "extract" from RDig

vinay · September 27, 2007, 11:16am

Hi All,

I have to have a site wide search for my current application. By search
I mean I have to search the static and the dynamic contents from the
database. I have been searching on this for a while on the net and RDig
seems to be a apt solution. While using it I have encountered a few
problems. I know these might be very basic issues but I have not been
able to figure out what is wrong with the code.

I had the following lines in my /config/environment.rb

require ‘rdig’
require ‘rdig_config’

I have the following code in my /config/rdig_config.rb

RDig.configuration do |cfg|
cfg.crawler.start_urls = [ ‘http://localhost:3000/login/index’ ]
cfg.index.path =
“C:/rails/managedsupport/index/development/rdig-index”
cfg.verbose = true
cfg.content_extraction = OpenStruct.new(
```
:hpricot      => OpenStruct.new(
```
```
  :title_tag_selector => 'title',
```
```
  :content_tag_selector => 'body'
```
```
)
```
)
end

I have created the index file using the code

rdig -c config/rdig_config.rb

Now in my controller I have written a code for testing the functionality

search_results = RDig.searcher.search(“some_string”)
```
 @results = search_results[:list]
```
```
 @hitcount = search_results[:hitcount]
```

My @result[:extract]is returning me the same initial view code that is
common to the application that is my menus and sub menus… I am not
getting the extract on the basis of which I had searched.

Any help in this regard would be highly appreciated…
Thanks in advance:)

vinay · October 1, 2007, 4:31pm

If you’re creating an index from a database, wouldn’t you use AAF? As
far as I know, RDig is for indexing external pages.

If you still want to try to get RDig to work, try crawling an external
URI. If that doesn’t work, try it from the command line.

vinay · October 1, 2007, 7:21pm

Eggman Eggman wrote:

If you’re creating an index from a database, wouldn’t you use AAF? As
far as I know, RDig is for indexing external pages.

If you still want to try to get RDig to work, try crawling an external
URI. If that doesn’t work, try it from the command line.

I am using AAF for indexing module search. But for searching the entire
site some thing that would search both the static and the dynamic
content. I have no clue in how to make the AAF to search the static
content of my application. If you have done a simillar stuff could you
be kind and share your experience or code with me,
Cheers,
jazzy

vinay · October 11, 2007, 1:52pm

jazzy jazzy wrote:
I did digg in to the api and found that my wep pages were indexed as a
documents and the search calling the function.

RDig::Search::Searcher

File lib/rdig/search.rb, line 43

43: def search(query, options={})
44: result = {}
45: query = query_parser.parse(query) if query.is_a?(String)
46: puts “Query: #{query}”
47: results = []
48: searcher = ferret_searcher
49: result[:hitcount] = searcher.search_each(query, options) do
|doc_id, score|
50: doc = searcher[doc_id]
51: results << { :score => score,
52: :title => doc[:title],
53: :url => doc[:url],
54: :extract => build_extract(doc[:data]) }
55: end
56: result[:list] = results
57: result
58: end

The extract is built by calling the function build_extract

# File lib/rdig/search.rb, line 60

60: def build_extract(data)
61: (data && data.length > 200) ? data[0…200] : data
62: end
Is this a bug in the library that it returns only the first 200
characters of the document it returns or did I not index my web page and
set my crawler properly in the first page… Any help in this regard
would be highly appreciated

Cheers;-
Jazzy