On Nov 6, 2008, at 10:18 AM, bwv549 wrote:
using rindex this takes .17 sec on an 85MB file
using index it takes .45 sec on the same file
Those numbers argue that rindex indeed seeks from the end.
Thanks for everyone’s help and I will check back to see if there are
any more thoughts.
this one never reads more than pagesize into memory and deals with the
fact that a needle could straddle two pages by keeping the current
page plus the previous page’s first bit (only need maximum of needle
size bytes) as the search target.
the extra code is just showing you that it does, in fact, find it’s
target.
you can up the percent for speed or crank it down to save on memory.
cfp:~ > cat a.rb
def tail_search io, needle, options = {}
io = open io unless io.respond_to?(:read)
percent = Float(options[‘percent’]||options[:percent]||0.10)
buf = ‘’
size = io.stat.size
pagesize = Integer(size * percent)
pos = 0
loop do
pos -= pagesize
break if pos.abs > size
io.seek(pos, IO::SEEK_END)
buf = io.read(pagesize) + buf[0, needle.size]
relative_index = buf.index(needle)
if relative_index
absolute_index = size + pos + relative_index
return absolute_index
end
end
return nil
ensure
io.close rescue nil
end
needle = ‘key=val’
index = tail_search(FILE, needle, :percent => 0.02)
if index
open(FILE) do |fd|
fd.seek index
puts fd.read(needle.size)
end
end
needle = ‘io.close rescue nil’
index = tail_search(FILE, needle, :percent => 0.02)
if index
open(FILE) do |fd|
fd.seek index
puts fd.read(needle.size)
end
end
END
key=val
cfp:~ > ruby a.rb
key=val
io.close rescue nil
a @ http://codeforpeople.com/