Ruby web crawler for https

luislavena · August 4, 2011, 1:42pm

Actually there is code in ruby which collects all the URLs from your
site and stores them in a file.
I took him out http://snippets.dzone.com/posts/show/1893
I understand that code is relevant for http connection, and uses GET and
HTTP requests.
I’ll scan the address type: https://site.domain.ru
How to edit this code?

stickz · August 4, 2011, 2:34pm

How to edit this code?

vi crawler.rb

xdg-open How To Ask Questions The Smart Way

stickz · August 4, 2011, 4:16pm

change this line

    if %r{http://([^/]+)/([^/]+)}i =~ $_

to this

    if %r{https?://([^/]+)/([^/]+)}i =~ $_

– Sergey Avseyev

stickz · August 5, 2011, 9:23pm

Simple ETL (Extract Transform Load) Tool for web
https://github.com/alexeypetrushin/wetl

Here’s the complete sample how to fully grab & parse the membrana.ru
site
https://github.com/alexeypetrushin/wetl/tree/master/examples/membrana