On 10/28/07, Mohit S. [email protected] wrote:
I so have to get the hang of inject, flatten and map.
Cheers,
Mohit.
10/28/2007 | 9:16 PM.
Hi,
They are definitely worth looking into - inject in particular is a
powerful tool (Robert K. can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).
Regards,
Sean
Mohit S. (with slight adjustment)
def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end
Robert K.
def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v|
v==1}.keys
end
from facets
def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end
require ‘benchmark’
def do_benchmark(title, n, methods, *args, &block)
puts ‘-’ * 40
puts title
puts ‘-’ * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end
get some data (Ubuntu specific I guess - YMMV)
array = File.read(‘/etc/dictionaries-common/words’).split(/\n/)
test w/o dups
do_benchmark(‘no duplicates’, 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)
create some duplicates
array = array[0…999] * 100
do_benchmark(‘duplicates’, 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)
END
$ ruby bm-duplicates.rb
no duplicates
user system total real
duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)