On Wed, Jan 31, 2007 at 06:37:14PM +0900, Shot (Piotr S.) wrote:
Eric H.:
You may want to define #eql? in terms of #to_a as well…
Thanks, but (considering Mauricio F.’s fix) what would be the
benefit of defining #eql? in terms of Set#sort instead of Set#==?
“None”. You cannot just
def eql?(o); sort == o.sort end
either, since there’s no guarantee there’s a total order associated to
the
set (it seems my reply to Pit where I pointed this out was dropped too):
require ‘set’
Set.new([‘a’, 1]).sort # =>
~> -:2:in `sort’: comparison of String with 1 failed (ArgumentError)
~> from -:2
This is why the hash implementation I gave you does
map{|x| x.hash}.sort.hash # sorting hash values, i.e. Integers
instead of
sort.map{|x| x.hash}.hash
Even when the elements can be ordered, sort would be O(n ln n) vs. #=='s
O(n)
Of course, the former is a “fast NlnN”, being written in C, so it would
seem
that it can keep the speed advantage up to large values of N.
However, #sort will process the whole array before it can start
comparing
values, and #== can begin right away. If there are many differing
elements,
the probability of finding one of them in the first comparisons is very
high.
Somewhat surprisingly, #== can beat #sort even in the most favorable
case for
the latter (only one differing element, the first in the sorted array
but not
in Hash#to_a, small sets):
require ‘benchmark’
require ‘set’
size = 64
elm = -10
elm2 = -2
s1 = Set.new((0…size).to_a << -1)
s2 = Set.new((0…size).to_a << elm)
GC.disable # don’t let this interfere
def bm(s1, s2, elm, iterations = 5000)
puts “#== will stop after #{1 + s2.to_a.index(elm)} .include?
tests”
Benchmark.bmbm(10) do |bm|
bm.report(“Set#sort”) { iterations.times{ s1.sort == s2.sort } }
bm.report(“Set#==”) { iterations.times{ s1 == s2 } }
end
end
bm(s1, s2, elm)
puts “-” * 40
bm(s1, Set.new((0…size).to_a << elm2), elm2)
>> #== will stop after 43 .include? tests
>> Rehearsal ---------------------------------------------
>> Set#sort 0.700000 0.050000 0.750000 ( 0.768818)
>> Set#== 0.370000 0.010000 0.380000 ( 0.381117)
>> ------------------------------------ total: 1.130000sec
>>
>> user system total real
>> Set#sort 0.680000 0.060000 0.740000 ( 0.736052)
>> Set#== 0.380000 0.010000 0.390000 ( 0.380995)
>> ----------------------------------------
>> #== will stop after 7 .include? tests
>> Rehearsal ---------------------------------------------
>> Set#sort 0.620000 0.010000 0.630000 ( 0.637358)
>> Set#== 0.170000 0.000000 0.170000 ( 0.175028)
>> ------------------------------------ total: 0.800000sec
>>
>> user system total real
>> Set#sort 0.750000 0.050000 0.800000 ( 0.807318)
>> Set#== 0.170000 0.010000 0.180000 ( 0.176633)