Parsing through downloaded html

sybrenkooistra · September 10, 2012, 11:38am

I get the error ‘block in irb_binding’ ‘in ‘each’’ ?

code =

require ‘Nokogiri’
require ‘Spreadsheet’
Spreadsheet.client_encoding = “UTF-8”
book = Speadsheet::Workbook.new
sheet1 = book.create_worksheet
row = 0
Dir.chdir(“anattempt”)
Dir.glob(’*.html’).each do |document| #apparently something goes wrong
here
f = file.open(document)
searchablefile = Nokogiri::HTML(f)

variabelekoloma = searchablefle.xpath("//h1")
sheet1[row,0] = variabelekoloma.content
row += 1
end
book.write ‘htmltoexcel.xls’

Plus, how could i save boolean? (for example: if xpath has

content/returns text, then ‘yes’).

sybrenkooistra · September 10, 2012, 12:12pm

035 > Dir.glob(’*.html’).each do |document|
036 > f = File.open(document)
037?> searchablefile = Nokogiri::HTML(f)
038?> variabelekoloma = searchablefile.xpath("//h1")
039?> shee1[row, 0] = variabelekoloma.content
040?> row += 1
041?> end
NoMethodError: undefined method ‘content’ for
#Nokogiri::XML::NodeSet:0xa501bbc>
from (irb):39: in ‘block in irb_binding’
from (irb):35: in ‘each’
from (irb):35

sybrenkooistra · September 10, 2012, 12:27pm

Please read tutorial I’ve gave you and copy code exactly as I gave you.
In my code there is at_xpath(“//h1”) in yours xpath(“//h1”). at_xpath
returns first item matching criteria, while xpath returns array of
elements. So this error tells you that there is no content method for
Array
class.

2012/9/10 Sybren K. [email protected]

sybrenkooistra · September 11, 2012, 1:41pm

Alright, perfect. It works.

However, many variables that I create are based on the POSSIBILITY that
a certain xpath/string exists in a file. If it doesnt, it should
preferably return a nil or a ‘no’.

By using at_xpath and then .content, if an xpath (for example pvda =
searchablefile.at_xpath("//h1[contains(text(), ‘pvda’)]") returns
nothing, the variable.content (in this case sheet1[row,1] =
pvda.content) does not work, and it no longer runs the rest of the code
(error = undefined method ‘content’ for nil:nilClass (NoMethodError)

How can I work around this and/or save boolean on the basis of xpath?

sybrenkooistra · September 11, 2012, 2:04pm

Sure it could be issue. Here is possible solution:
unless searchablefile.at_xpath(“//h1[contains(text(), ‘pvda’)]”).nil?
sheet[row,column] = searchablefile.at_xpath(“//h1[contains(text(),
‘pvda’)]”).content
end

2012/9/11 Sybren K. [email protected]

sybrenkooistra · September 11, 2012, 2:45pm

Works like a charm

Any idea how I can save boolean (in some cases I do not want the content
of a node, but I only want to know IF a certain word IS or IS NOT part
of the html)

sybrenkooistra · September 11, 2012, 3:02pm

2012/9/11 Sybren K. [email protected]

Works like a charm

Any idea how I can save boolean (in some cases I do not want the content
of a node, but I only want to know IF a certain word IS or IS NOT part
of the html)

–
Posted via http://www.ruby-forum.com/.

You can use Regex for that.
Find some ruby regex tutorial on google.
Example tutorial
Ruby - Regular Expressions
Here is example code from that tutorial:

#!/usr/bin/ruby

line1 = “Cats are smarter than dogs”;
line2 = “Dogs also like meat”;

if ( line1 =~ /Cats(.)/ )
puts “Line1 starts with Cats”
end
if ( line2 =~ /Cats(.)/ )
puts “Line2 starts with Dogs”
end

sybrenkooistra · October 21, 2012, 11:41pm

On Sep 7, 2012, at 5:32, Ryan D. wrote:

Occupy Ruby—why we need to moderate the 1%:

Occupy Ruby - Why we need to moderate the 1% | 2012 Cascadia Ruby Conference | by ryan davis

I watched the talk and was led to read the article on “help vampires” by
Amy
Hoy.
I was surprised to discover this gem:
"Note that I use ‘he’ here in the general sense even though Help
Vampires
are almost exclusively male.
It appears that male Help Vampire, drawn as it is to shiny technology,
occupies an evolutionary niche
that females of the species simply do not find desirable. "

I couldn’t help but chuckle. Please don’t shun Amy.