I’ve been experimenting using open-uri to retrieve HTML from various
sites and have spent the past couple of days searching for an answer to
my problem. One page I’ve been trying to access contains a GUID in the
querystring and it has curly brackets around it. This is my code (I’m
just using Google for this example):
require ‘open-uri’
uri =
URI.parse(‘{3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01} - Google Search’)
html = open(uri).read()
But I get the following error:
URI::InvalidURIError in DemoController#index
bad URI(is not URI?):
http://www.google.co.uk/search?q={3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01}
RAILS_ROOT: /code/public/…/config/…
/usr/local/lib/ruby/1.8/uri/common.rb:432:in split' /usr/local/lib/ruby/1.8/uri/common.rb:481:in
parse’
#{RAILS_ROOT}/app/controllers/scrape_controller.rb:10:in `index’
From what I’ve found, curly brackets aren’t allowed in a Ruby URI
(despite my browser being capable of handling them). I’ve tried encoding
them with %7B and %7D but then the page is called with the encoded
version, which in my case, the site isn’t expecting and it doesn’t work.
Is there a way to allow curly brackets in my URI? or another method to
download HTML which does?
Thanks.
On Wed, 28 Jun 2006, Richard Leonard wrote:
html = open(uri).read()
/usr/local/lib/ruby/1.8/uri/common.rb:432:in `split’
Thanks.
irb(main):001:0>
URI.parse(URI.encode(“http://www.google.co.uk/search?q={3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01}”))
=> #<URI::HTTP:0xfdba8216c
URL:http://www.google.co.uk/search?q={3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01}>
-a
On Wed, 28 Jun 2006 [email protected] wrote:
irb(main):001:0>
URI.parse(URI.encode(“http://www.google.co.uk/search?q={3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01}”))
=> #<URI::HTTP:0xfdba8216c
URL:http://www.google.co.uk/search?q={3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01}>
sorry. hit send too soon - here’s the rest:
irb(main):002:0> f = open uri
=> #<File:/tmp/open-uri11746.1>
irb(main):003:0> f.read.size
=> 17004
so encoding seems to work for me?
regards.
-a
The URL I’m actually trying to get is:
http://www.welshwalks.info/walksASP/routeSearchResult.asp?regionName=Isle+of+Anglesey®ionID={740C2C13-C8DB-49F7-81AE-3D23AC6ACCF2}
If you try that after passing it to URI.encode then the HTML I download
is their homepage - not the search results page you get if you paste
that into a browser.
Any ideas?
Richard Leonard wrote:
From what I’ve found, curly brackets aren’t allowed in a Ruby URI
(despite my browser being capable of handling them).
Browsers lie.
require ‘open-uri’
require ‘cgi’
uri =
URI.parse(‘Google’ +
CGI.escape(‘{3F2504E0-4F89-11D3-9A-0C-03-05-E8-2C-33-01}’))
html = open(uri).read()
p html
–
James B.
“Programs must be written for people to read, and only incidentally
for machines to execute.”
- H. Abelson and G. Sussman
(in "The Structure and Interpretation of Computer Programs)