I’m trying to retrieve search results from the internet using nokogiri
and open-uri. Apparently ‘open-uri’ can’t handle directly UTF-8. So I’m
trying to convert the string in ASCII but still I come up with an error.
Here is the chunk of code:
encoding: UTF-8
require “nokogiri”
require “open-uri”
word = “Ελληνικά”
ascii_word = word.force_encoding(“ASCII”).to_s
result = open(“Lycos.com”,
“User-Agent” => "HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Windows NT
6.0; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.47 S
doc = Nokogiri::HTML(result)
And the error I get is:
[…]:in open': invalid byte sequence in US-ASCII (ArgumentError) from lycos.rb:8:in ’
I’m on MacOSX ML, using ruby (rvm) 1.9.3 .
I tried using ‘force_encofing(“US-ASCII”)’ but it’s not a recognized
format. The word is Greek and uses UTF-8. Any ideas would be welcomed.
On Thu, 8 Nov 2012 01:07:41 +0900, Panagiotis A. wrote:
ascii_word = word.force_encoding(“ASCII”).to_s
As per RFC (2396?), you need to encode the non-asci bit, thusly:
#!/usr/bin/ruby
encoding: UTF-8
require “nokogiri”
require “open-uri”
word = URI.encode(“Ελληνικά”)
result = open(“Lycos.com”,
“User-Agent” =>
“HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Windows NT 6.0;
en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.47”)
doc = Nokogiri::HTML(result)
puts doc
If one were to examine the ruby URI docs, how would one know that there
is a method named URI.escape?
try eg,
ri URI::Escape.escape
But to write that, you already have to know there is an escape() method
in some namespace somewhere. How come when I look at the docs, the
docs don’t list the methods that I can call on URI? Isn’t that the
purpose
of the docs?