Duplicate elements in array

Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = [“apple”, “banana”, “apple”, “orange”]
=> [“apple”, “banana”, “apple”, “orange”]
array.uniq
=> [“apple”, “banana”, “orange”]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib

On 10/28/07, Shuaib Z. [email protected] wrote:

array.uniq
Regards
Shuaib

Posted via http://www.ruby-forum.com/.

Here’s one way (I’m sure there must be a simpler approach - just can’t
think of it right now):

array = [“apple”, “banana”, “apple”, “orange”]
counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item|
hash[item]
+= 1; hash}
p counts #=> {“apple”=>2, “banana”=>1, “orange”=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> [“apple”]

Regards,
Sean

Shuaib Z. wrote:

=> [“apple”, “banana”, “orange”]
Shuaib

I don’t know a good way to do it, but one way to get the result would be
to force it into a hash since that eliminates duplicates.

I’m sure there’s a better way to do it, but here’s what I got.

array = [“apple”, “banana”, “apple”, “orange”, “fat”, “cow”, “cow”]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn’t matter what we store
end
}

puts duplicates

Cheers
Mohit.

On 28.10.2007 14:16, Mohit S. wrote:

array = [“apple”, “banana”, “apple”, “orange”]
any idea
array = [“apple”, “banana”, “apple”, “orange”]
counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item|
hash[item]
+= 1; hash}
p counts #=> {“apple”=>2, “banana”=>1, “orange”=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> [“apple”]

irb(main):007:0> array = %w{apple banana apple orange}
=> [“apple”, “banana”, “apple”, “orange”]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> [“apple”]

Kind regards

robert

Sean O’Halpin wrote:

=> [“apple”, “banana”, “apple”, “orange”]

counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item| hash[item]
+= 1; hash}
p counts #=> {“apple”=>2, “banana”=>1, “orange”=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> [“apple”]

Regards,
Sean

I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.

Thanks a lot guys.
It works.

I really appreciate your help

Cheers
Shuaib

On 10/28/07, Robert K. [email protected] wrote:

Succint ~and~ efficient! Do you have a mail filter checking for any
posts
containing ‘inject’? :slight_smile:

Regards,
Sean

On 10/28/07, Shuaib Z. [email protected] wrote:

=> [“apple”, “banana”, “orange”]
Shuaib

Posted via http://www.ruby-forum.com/.

arr,dup = [“apple”, “banana”, “apple”, “orange”],[]
(arr.length-1).times do
a = arr.shift
dup << a if arr.include?(a)
end
p dup.uniq

Harry

On 10/28/07, Mohit S. [email protected] wrote:

I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert K. can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

Mohit S. (with slight adjustment)

def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end

Robert K.

def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v|
v==1}.keys
end

from facets

def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require ‘benchmark’

def do_benchmark(title, n, methods, *args, &block)
puts ‘-’ * 40
puts title
puts ‘-’ * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end

get some data (Ubuntu specific I guess - YMMV)

array = File.read(‘/etc/dictionaries-common/words’).split(/\n/)

test w/o dups

do_benchmark(‘no duplicates’, 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

create some duplicates

array = array[0…999] * 100
do_benchmark(‘duplicates’, 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

END
$ ruby bm-duplicates.rb

no duplicates

                user     system      total        real

duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)

Sean O’Halpin wrote:

Mohit S. (with slight adjustment)

array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
puts ‘-’ * 40
array = File.read(’/etc/dictionaries-common/words’).split(/\n/)
END

                user     system      total        real

duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.

On 10/28/07, Mohit S. [email protected] wrote:

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.

It depends what you’re doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn’t show up
in your example because your data contained at most two instances of
an item. If you change your example to:

array = [“apple”, “banana”, “apple”, “orange”, “fat”, “cow”, “cow”,
“apple”, “apple”]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn’t matter what we store
end
}

puts duplicates

it outputs

apple
cow
apple
apple

which is probably not what you want.

Regards,
Sean

Duplicate elements in array
Posted by Shuaib Z. (shuaib85) on 28.10.2007 13:47
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.

Here’s yet another way to do it:
http://snippets.dzone.com/posts/show/4148

Cheers,

j.k.

Sean O’Halpin wrote:

duplicates = []

Sean

Thanks for the explanation, Sean. Actually, I guess it’s not clear if
the OP wants to know each occurrence of the duplicates or just the list
of duplicates. But, there are now solutions for both cases!

Cheers,
Mohit.
10/29/2007 | 11:44 AM.

From: Sean O’Halpin [mailto:[email protected]]

$ ruby bm-duplicates.rb

----------------------------------------

no duplicates

----------------------------------------

user system total real

duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)

duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)

duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)

----------------------------------------

duplicates

----------------------------------------

user system total real

duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)

duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)

duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

i just tested this using ruby1.9 on a p4 box running windowsxp. i
included ruby’s group_by and got surprising results.

C:\ruby1.9\bin>diff test-old.rb test.rb
19a20,24

#1.9’s group_by
def duplicates_4(array)
array.group_by{|e|e}.select{|_,k| k.size>1}.keys
end

26c31
< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|

Benchmark.bmbm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
34c39
< array = File.read(‘/etc/dictionaries-common/words’).split(/\n/)


array = File.read(‘american-english’).split(/\n/)
38c43
< :duplicates_3], array)


:duplicates_3,:duplicates_4], array)
43c48
< :duplicates_3], array)


:duplicates_3,:duplicates_4], array)

C:\ruby1.9\bin>

C:\ruby1.9\bin>ruby test.rb

no duplicates

Rehearsal -------------------------------------------------
duplicates_1 7.609000 0.094000 7.703000 ( 7.984000)
duplicates_2 10.438000 0.109000 10.547000 ( 11.608000)
duplicates_3 14.609000 0.219000 14.828000 ( 14.874000)
duplicates_4 11.422000 0.141000 11.563000 ( 14.201000)
--------------------------------------- total: 44.641000sec

                user     system      total        real

duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)

duplicates

Rehearsal -------------------------------------------------
duplicates_1 3.375000 0.000000 3.375000 ( 3.765000)
duplicates_2 3.218000 0.000000 3.218000 ( 3.828000)
duplicates_3 3.250000 0.000000 3.250000 ( 3.672000)
duplicates_4 2.032000 0.047000 2.079000 ( 2.077000)
--------------------------------------- total: 11.922000sec

                user     system      total        real

duplicates_1 3.375000 0.000000 3.375000 ( 3.437000)
duplicates_2 3.188000 0.000000 3.188000 ( 3.218000)
duplicates_3 3.219000 0.015000 3.234000 ( 3.281000)
duplicates_4 1.844000 0.000000 1.844000 ( 1.859000)

C:\ruby1.9\bin>

kind regards -botp

2007/10/28, Sean O’Halpin [email protected]:

On 10/28/07, Robert K. [email protected] wrote:

irb(main):007:0> array = %w{apple banana apple orange}
=> [“apple”, “banana”, “apple”, “orange”]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> [“apple”]

Succint ~and~ efficient!

Thanks!

Do you have a mail filter checking for any posts
containing ‘inject’? :slight_smile:

I don’t need that since most of them were written by me. :slight_smile: (slight
exaggeration)
chuckle

Kind regards

robert