Hi all,
Am having problems using RDig:
With this rdig config…
cfg.crawler.start_urls = [‘http://www.defensetech.org’]
cfg.crawler.include_hosts = [‘www.defensetech.org’]
cfg.index.path = ‘/my/path/to/index’
cfg.verbose = true
…I get this output:
$ rdig -c config/rdig_config.rb
/usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45: warning: method
redefined; discarding old text=
/usr/local/lib/site_ruby/1.8/ferret/search/sort_field.rb:69: warning:
instance variable @name not initialized
/usr/local/lib/site_ruby/1.8/ferret/search/sort_field.rb:69: warning:
instance variable @name not initialized
lib/ferret/query_parser/query_parser.y:128: warning: method redefined;
discarding old initialize
lib/ferret/query_parser/query_parser.y:157: warning: method redefined;
discarding old parse
lib/ferret/query_parser/query_parser.y:216: warning: method redefined;
discarding old clean_string
/usr/lib/ruby/gems/1.8/gems/rubyful_soup-1.0.4/lib/rubyful_soup.rb:230:
warning: method redefined; discarding old attrs
discovered content extractor class:
RDig::ContentExtractors::PdfContentExtractor
discovered content extractor class:
RDig::ContentExtractors::WordContentExtractor
discovered content extractor class:
RDig::ContentExtractors::HtmlContentExtractor
using Ferret 0.9.0
/usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:116: warning: instance
variable @patterns not initialized
/usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:105: warning: instance
variable @patterns not initialized
added url http://www.defensetech.org
fetching http://www.defensetech.org
waiting for threads to finish…
/usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:116: warning: instance
variable @patterns not initialized
/usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:105: warning: instance
variable @patterns not initialized
added url http://www.defensetech.org
error processing document http://www.defensetech.org/: undefined local
variable or method url' for #<RDig::HttpDocument:0xb7a7fbb4> Trace: /usr/local/lib/site_ruby/1.8/rdig/documents.rb:35:in
initialize’
/usr/local/lib/site_ruby/1.8/rdig/documents.rb:107:in initialize' /usr/local/lib/site_ruby/1.8/rdig/documents.rb:15:in
create’
/usr/local/lib/site_ruby/1.8/rdig/crawler.rb:68:in add_url' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:51:in
process_document’
/usr/local/lib/site_ruby/1.8/rdig/crawler.rb:50:in process_document' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:28:in
run’
/usr/local/lib/site_ruby/1.8/rdig/crawler.rb:25:in run' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:24:in
run’
/usr/local/lib/site_ruby/1.8/rdig.rb:258:in `run’
/usr/bin/rdig:14
If anyone could tell me why @patterns and url aren’t being set, I’d
really appreciate it.
Am on Ubuntu 6.06, ruby 1.8.4, gems: rdig 0.3.0, rubyful_soup 1.0.4,
ferret 0.9.4
Many Thanks,
Steven