Nested loop for xpath

Rails 1.9.3

I am scraping a web page,

using Nokogiri and Xpath

I want all the names of the birds listed there.

xpath1 = ‘//div[@id=‘bodyContent’]/div[@id=‘mw-content-text’]/ul[1]’

should get a group of birds

Great Tinamou
Andean Tinamou
Elegant Crested Tinamou
Little Tinamou
Slaty-breasted Tinamou
Thicket Tinamou

I need to get each one of these names and tried nested loop

xpath1 = ‘//div[@id=‘bodyContent’]/div[@id=‘mw-content-text’]/ul’
doc = Nokogiri::HTML(open(url))
categories = doc.xpath(xpath1)

categories.each do | c |
c.xpath(‘/li’).each do | n |
p n.text
end
end

gives empty values.
Can anyone tell why? or are there better ways?

soichi

On Fri, Apr 5, 2013 at 7:57 AM, Soichi I. [email protected]
wrote:

That does not work. You need to escape single quotes or use double
quotes.

gives empty values.
Can anyone tell why?

Yes, your XPath in the nested loop searches a

  • at the top of the
    document
    because you prefix with “/”. You would need

    or are there better ways?

    Yes. Having a loop here does not really make sense since that can be
    solved by the XPath.

    irb(main):009:0> puts
    dom.xpath(‘//div[@id=“bodyContent”]/div[@id=“mw-content-text”]/ul[1]//a/text()’).map(&:to_s)
    Great Tinamou
    Andean Tinamou
    Elegant Crested Tinamou
    Little Tinamou
    Slaty-breasted Tinamou
    Thicket Tinamou

    Note, why #to_s is necessary to get String instances:

    irb(main):012:0>
    dom.xpath(‘//div[@id=“bodyContent”]/div[@id=“mw-content-text”]/ul[1]//a/text()’).each
    {|n| p n}
    #<Nokogiri::XML::Text:0x…fc02b6340 “Great Tinamou”>
    #<Nokogiri::XML::Text:0x…fc02b469e “Andean Tinamou”>
    #<Nokogiri::XML::Text:0x…fc02b3d48 “Elegant Crested Tinamou”>
    #<Nokogiri::XML::Text:0x…fc02b33f2 “Little Tinamou”>
    #<Nokogiri::XML::Text:0x…fc02b1822 “Slaty-breasted Tinamou”>
    #<Nokogiri::XML::Text:0x…fc02b1016 “Thicket Tinamou”>
    => 0

    Btw, you can also use a more explicit XPath:
    ‘//div[@id=“bodyContent”]/div[@id=“mw-content-text”]/ul[1]/li/a/text()’

    (replaced “//” with “/li/”)

    Kind regards

    robert

  • thank you for your help! It worked.