I need to parse and redisplay in html wikipedia articles (formatted
with the wikipedia style). Has anyone encountered such a library in
ruby ? Any libraries that are good at that?
Thanks
I need to parse and redisplay in html wikipedia articles (formatted
with the wikipedia style). Has anyone encountered such a library in
ruby ? Any libraries that are good at that?
Thanks
David wrote:
I need to parse and redisplay in html wikipedia articles (formatted
with the wikipedia style). Has anyone encountered such a library in
ruby ? Any libraries that are good at that?Thanks
http://www.autopendium.co.uk
Stuff about old cars
Usually you shouldn’t use bots on wikipedia, but should download the
free database instead and use that.
Read about their policy here:
If you have your own mediawiki install and want to use a bot, you can
check out pywikipedia bot:
pywikibot download | SourceForge.net It’s not in ruby,
but it works great.
Actually, I’m not entirely sure that you shouldn’t use bots at all on
the
Wikipedia. According to the link you provided:
“Robots or bots are automatic
processeshttp://en.wikipedia.org/wiki/Process_(computing)that
interact with Wikipedia as though they were human editors”
That last bit sounds like they’re talking about a very specific kind of
bot
and not just a scraper.
RSL
I wrote that article a while ago. It’ll be interesting to use
WWW::Mechanize, or better yet, scRUBYt, which use Hpricot in the
backend anyway.
Shane
“Robots or bots are automatic
processeshttp://en.wikipedia.org/wiki/Process_(computing)that
interact with Wikipedia as though they were human editors.” There’s
nothing against screen-scraping there. That policy is about bots which
edit
content. Otherwise, Google would be breaking WP policy.
This is taking the discussion a little off topic though.
-Nathan
If you just need to cache some pages for displaying later, screen
scraping Wikipedia is a good choice compared to downloading the db.
If you’re going to be parsing and redisplaying the content in real
time that is against Wikipedia’s policy.
See Wikipedia, the free encyclopedia
Wikipedia:Database_download#Why_not_just_retrieve_data_from_wikipedia.or
g_at_runtime.3F
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs