On 01/28/2011 05:09 PM, Jos B. wrote:
Hi,
I’m trying to inflate a set of concatenated gzipped blobs stored in a single
file. As it stands, Zlib::GzipReader only inflates the first blob. It
appears that the unused instance method would return the remaining data,
ready to be passed into Zlib::GzipReader, but it yields an error:
method `method_missing’ called on hidden T_STRING object
What could be going on here?
I’m not sure what’s going on, but I was hoping you could solve your
problem by running something like this:
File.open(‘gzipped.blobs’) do |f|
begin
loop do
Zlib::GzipReader.open(f) do |gz|
puts gz.read
end
end
rescue Zlib::GzipFile::Error
# End of file reached.
end
end
Unfortunately, Ruby 1.8 doesn’t appear to support passing anything other
than a file name to Zlib::GzipReader.open, and Ruby 1.9 seems to always
reset the file position to the beginning of the file prior to starting
extraction when you really need it to just start working from the
current position. So it doesn’t appear that you can do this with the
standard library.
As part of a ZIP library I wrote, there is a more general implementation
of a Zlib stream filter. Install the archive-zip gem and then try the
following:
gem ‘archive-zip’
require ‘archive/support/zlib’
File.open(‘gzipped.blobs’) do |f|
until f.eof? do
Zlib::ZReader.open(f, 15 + 16) do |gz|
gz.delegate_read_size = 1
puts gz.read
end
end
end
This isn’t super efficient because we have to hack the
delegate_read_size to be 1 byte in order to ensure that the trailing
gzip data isn’t sucked into the read buffer of the current ZReader
instance and hence lost between iterations. It shouldn’t be too bad
though since the File object should be handling its own buffering.
BTW, I wrote some pretty detailed documentation for Zlib::ZReader. It
should explain what the 15 + 16 is all about in the open method in case
you need to tweak things for your own streams.
On a related note, Zlib::GzipReader#{pos,tell} returns the position in the
output stream (zstream.total_out) whereas I am looking for the position in
the input stream. I tried making zstream.total_in available but the value
appears to be 18 bytes short in my test file, that is, the next header is
found 18 bytes beyond what zstream.total_in reports.
I think total_in is counting only the compressed data; however,
following the compressed data is a trailer as required for gzip blobs.
You could probably always add 18 to whatever you get, but as I noted
earlier, the implementation of GzipReader seems to always reset any file
object back to the beginning of the stream rather than start processing
it from an existing position. I can’t find any documentation listing a
way to force GzipReader to jump to any other file position after
initialization either.
Does anybody know how to make the library return the correct offset into the
input stream so multiple compressed blobs can be handled?
Hopefully, my solution will work for you because I don’t think the
current implementation in the standard library will do what you need.
-Jeremy