Actually there is code in ruby which collects all the URLs from your
site and stores them in a file.
I took him out http://snippets.dzone.com/posts/show/1893
I understand that code is relevant for http connection, and uses GET and
HTTP requests.
I’ll scan the address type: https://site.domain.ru
How to edit this code?
change this line
if %r{http://([^/]+)/([^/]+)}i =~ $_
to this
if %r{https?://([^/]+)/([^/]+)}i =~ $_
– Sergey Avseyev
Simple ETL (Extract Transform Load) Tool for web
https://github.com/alexeypetrushin/wetl
Here’s the complete sample how to fully grab & parse the membrana.ru
site
https://github.com/alexeypetrushin/wetl/tree/master/examples/membrana