Re: Posix Pangrams (#97)

phrogz · October 6, 2006, 7:08pm

From: Jamie M. [mailto:[email protected]]

(There’s no joy in trying to parse/string HTML.)
Sure there is.

require ‘rubygems’; require ‘hpricot’; require ‘open-uri’
doc = Hpricot(open(‘http://www.unix.org/version3/apis/cu.html’))
puts (doc/“p.tent/i”).map{|i|i.inner_html}

Hpricot makes HTML scraping fun again (no, really!).

I copy/pasted the rows from the table and did it like this:
puts <<ENDS.scan( /([^<]+)/ ).flatten.join( ’ ’ )
#HTML HERE
ENDS

Actually…I can’t get Hpricot to install to test your code (gem server
seems to be down) but doesn’t that grab way more information than you
wanted from the table? The table headers and all columns, too?

phrogz · October 6, 2006, 7:41pm

On 10/6/06, Gavin K. [email protected] wrote:

From: Jamie M. [mailto:[email protected]]

require ‘rubygems’; require ‘hpricot’; require ‘open-uri’
doc = Hpricot(open(‘http://www.unix.org/version3/apis/cu.html’))
puts (doc/“p.tent/i”).map{|i|i.inner_html}

Actually…I can’t get Hpricot to install to test your code (gem server
seems to be down) but doesn’t that grab way more information than you
wanted from the table? The table headers and all columns, too?

I’m assuming the problems with the gem server are related to the
problems with RubyForge. Hopefully that will be resolved soon, I know
it’s being worked on.

Regarding your question: note the “/i” after “p.tent”.

Jacob F.

phrogz · October 6, 2006, 7:47pm

On 10/6/06, Gavin K. [email protected] wrote:

From: Jamie M. [mailto:[email protected]]

require ‘rubygems’; require ‘hpricot’; require ‘open-uri’
doc = Hpricot(open(‘http://www.unix.org/version3/apis/cu.html’))
puts (doc/“p.tent/i”).map{|i|i.inner_html}

Actually…I can’t get Hpricot to install to test your code (gem server
seems to be down) but doesn’t that grab way more information than you
wanted from the table? The table headers and all columns, too?

I am pulling the entire html file down, but by dividing the Hpricot
instance I’m essentially asking it to give me all the tags that
are inside a

with the tent class. Given the content of the file I
could probably just do doc/“i” but it would also grab the ‘opt’ from
the definition list up top.

My code does give the same output as yours, excepting that since I’m
putsing the array rather than joining I get one command per line.

Jamie