Hi,
I’m using Ferret and Rdig, and I’m trying to index HTML between tags
without success :
I just want to index data like this :
<! – startToIndex -->
Here’s my HTML code which I want to index
My code is the following :
cfg.content_extraction = OpenStruct.new(
# HPRICOT configuration
# hpricot is the html parsing lib used by RDig. See
# http://code.whytheluckystiff.net/hpricot for usage information.
# Any code blocks given for content selection will receive an
Hpricot instance
# containing the full page content when called.
:hpricot => OpenStruct.new(
# css selector for the element containing the page title
:title_tag_selector => ‘title’,
# might also be a proc returning either an element or a string:
# :title_tag_selector => lambda { |hpricot_doc| … }
:content_tag_selector => ‘body’
# might also be a proc returning either an element or a string:
# :content_tag_selector => lambda { |hpricot_doc| … }
)
)
Any help would be helpful
Best regards,