Hello,
I need some sort of advise over where to start digging (and how) because
I’m a bit confused.
I’d like to be able to grab all content from a website. Using nokogiri I
can use XPath and get blog post content among other things from a web
page. But I don’t have a clue about where to start looking in order to
be able to scan a website flying through all possible links that include
that website.
Is nokogiri the right tool or should I use something like mechanize? Can
you provide any hint on how to perform scraping on an entire website?
I’m interested in blogs mostly, wordpress and blogger platforms for the
time being.
Best Regards,
Panagiotis A.