Problem scraping using nokogiri - getting wrong characters

Hi all,

I am scraping a table off of another site and inserting it onto my
site. you can see an example on the initial page at:
http://mthosts.heroku.com.
I’m referring to the green box with the snowbird weather and snowfall
information.

this box has been scraped off of the snowbird site at:
http://www.snowbird.com/ski_board/snowreport.php

The problem is that on the snowbird site it has degree symbols (°) but
on my page it shows up as: (�)

I think it has something to do with the encoding but i’m pretty new to
html etc. and am not sure what i can do to fix this. I’ve tried
substituting the characters and some other things but haven’t had any
success yet.

any ideas?

thanks,

max

Hi!

I opened the html source from the snowreport.php site and I noted that
the
strange symbols that you mentioned are htmlencoded
characters. The symbol is °

I had a similar problem on last Monday, but I couldn’t complete solve
it.

Try the lib: http://htmlentities.rubyforge.org/

or use a regular expression (sub, gsub) to substitute ° for the
degrees
symbol.

Regards,

Everaldo

i tried that but it didn’t work for me. what did was to explicitly
set the encoding property in nokogiri

url = 'http://www.snowbird.com/ski_board/snowreport.php'
page = Nokogiri::HTML(open(url))
page.encoding = 'utf-8'

worked great after that!

thx,

Max