Is anybody out there working on a Ruby-based parser for the WARC (Web
ARChive) file format that (relatively) recently replaced the ARC format?
Or perhaps Ruby wrappers around WARC-parsing utilities?
Hello, I have written a ruby WARC parser a while ago and made it
available
on the intertubes today. There is no documentation right now, but it
shouldn’t be too hard to figure out how to use it by looking at the
tests. I don’t
plan on working on it much but I would be glad if you could contribute
to it.
Source code : GitHub - antoinerg/warc-ruby: warc is a pure ruby implementation of Web ARChive file reader and writer
As a gem : gem install warc