Duplicate elements in array

shuaib85 · October 28, 2007, 1:47pm

Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = [“apple”, “banana”, “apple”, “orange”]
=> [“apple”, “banana”, “apple”, “orange”]
array.uniq
=> [“apple”, “banana”, “orange”]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib

shuaib85 · October 28, 2007, 2:12pm

On 10/28/07, Shuaib Z. [email protected] wrote:

array.uniq
Regards
Shuaib

Posted via http://www.ruby-forum.com/.

Here’s one way (I’m sure there must be a simpler approach - just can’t
think of it right now):

array = [“apple”, “banana”, “apple”, “orange”]
counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item|
hash[item]
+= 1; hash}
p counts #=> {“apple”=>2, “banana”=>1, “orange”=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> [“apple”]

Regards,
Sean

shuaib85 · October 28, 2007, 2:16pm

Shuaib Z. wrote:

=> [“apple”, “banana”, “orange”]
Shuaib

I don’t know a good way to do it, but one way to get the result would be
to force it into a hash since that eliminates duplicates.

I’m sure there’s a better way to do it, but here’s what I got.

array = [“apple”, “banana”, “apple”, “orange”, “fat”, “cow”, “cow”]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn’t matter what we store
end
}

puts duplicates

Cheers
Mohit.

shuaib85 · October 28, 2007, 2:42pm

On 28.10.2007 14:16, Mohit S. wrote:

array = [“apple”, “banana”, “apple”, “orange”]
any idea
array = [“apple”, “banana”, “apple”, “orange”]
counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item|
hash[item]
+= 1; hash}
p counts #=> {“apple”=>2, “banana”=>1, “orange”=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> [“apple”]

irb(main):007:0> array = %w{apple banana apple orange}
=> [“apple”, “banana”, “apple”, “orange”]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> [“apple”]

Kind regards

robert

shuaib85 · October 28, 2007, 2:19pm

Sean O’Halpin wrote:

=> [“apple”, “banana”, “apple”, “orange”]

counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item| hash[item]
+= 1; hash}
p counts #=> {“apple”=>2, “banana”=>1, “orange”=>1}
p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> [“apple”]

Regards,
Sean

I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.

shuaib85 · October 28, 2007, 2:48pm

Thanks a lot guys.
It works.

I really appreciate your help

Cheers
Shuaib

shuaib85 · October 28, 2007, 3:03pm

On 10/28/07, Robert K. [email protected] wrote:

Succint ~and~ efficient! Do you have a mail filter checking for any
posts
containing ‘inject’?

Regards,
Sean

shuaib85 · October 28, 2007, 2:55pm

On 10/28/07, Shuaib Z. [email protected] wrote:

=> [“apple”, “banana”, “orange”]
Shuaib

Posted via http://www.ruby-forum.com/.

arr,dup = [“apple”, “banana”, “apple”, “orange”],[]
(arr.length-1).times do
a = arr.shift
dup << a if arr.include?(a)
end
p dup.uniq

Harry

shuaib85 · October 28, 2007, 6:47pm

On 10/28/07, Mohit S. [email protected] wrote:

I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert K. can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

Mohit S. (with slight adjustment)

def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end

Robert K.

def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v|
v==1}.keys
end

from facets

def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require ‘benchmark’

def do_benchmark(title, n, methods, *args, &block)
puts ‘-’ * 40
puts title
puts ‘-’ * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end

get some data (Ubuntu specific I guess - YMMV)

array = File.read(‘/etc/dictionaries-common/words’).split(/\n/)

test w/o dups

do_benchmark(‘no duplicates’, 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

create some duplicates

array = array[0…999] * 100
do_benchmark(‘duplicates’, 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

END
$ ruby bm-duplicates.rb

no duplicates

                user     system      total        real

duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)

shuaib85 · October 28, 2007, 7:16pm

Sean O’Halpin wrote:

Mohit S. (with slight adjustment)

array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
puts ‘-’ * 40
array = File.read(’/etc/dictionaries-common/words’).split(/\n/)
END
                user     system      total        real
duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.

shuaib85 · October 28, 2007, 7:48pm

On 10/28/07, Mohit S. [email protected] wrote:

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.

It depends what you’re doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn’t show up
in your example because your data contained at most two instances of
an item. If you change your example to:

array = [“apple”, “banana”, “apple”, “orange”, “fat”, “cow”, “cow”,
“apple”, “apple”]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn’t matter what we store
end
}

puts duplicates

it outputs

apple
cow
apple
apple

which is probably not what you want.

Regards,
Sean

shuaib85 · October 28, 2007, 10:31pm

Duplicate elements in array
Posted by Shuaib Z. (shuaib85) on 28.10.2007 13:47
Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.

Here’s yet another way to do it:
http://snippets.dzone.com/posts/show/4148

Cheers,

j.k.

shuaib85 · October 29, 2007, 4:45am

Sean O’Halpin wrote:

duplicates = []

Sean

Thanks for the explanation, Sean. Actually, I guess it’s not clear if
the OP wants to know each occurrence of the duplicates or just the list
of duplicates. But, there are now solutions for both cases!

Cheers,
Mohit.
10/29/2007 | 11:44 AM.

shuaib85 · October 29, 2007, 5:02am

From: Sean O’Halpin [mailto:[email protected]]

$ ruby bm-duplicates.rb

----------------------------------------

no duplicates

----------------------------------------

user system total real

duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)

duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)

duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)

----------------------------------------

duplicates

----------------------------------------

user system total real

duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)

duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)

duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

i just tested this using ruby1.9 on a p4 box running windowsxp. i
included ruby’s group_by and got surprising results.

C:\ruby1.9\bin>diff test-old.rb test.rb
19a20,24

#1.9’s group_by
def duplicates_4(array)
array.group_by{|e|e}.select{|_,k| k.size>1}.keys
end

26c31
< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|

Benchmark.bmbm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
34c39
< array = File.read(‘/etc/dictionaries-common/words’).split(/\n/)

array = File.read(‘american-english’).split(/\n/)
38c43
< :duplicates_3], array)

:duplicates_3,:duplicates_4], array)
43c48
< :duplicates_3], array)

:duplicates_3,:duplicates_4], array)

C:\ruby1.9\bin>

C:\ruby1.9\bin>ruby test.rb

no duplicates

Rehearsal -------------------------------------------------
duplicates_1 7.609000 0.094000 7.703000 ( 7.984000)
duplicates_2 10.438000 0.109000 10.547000 ( 11.608000)
duplicates_3 14.609000 0.219000 14.828000 ( 14.874000)
duplicates_4 11.422000 0.141000 11.563000 ( 14.201000)
--------------------------------------- total: 44.641000sec

                user     system      total        real

duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)

duplicates

Rehearsal -------------------------------------------------
duplicates_1 3.375000 0.000000 3.375000 ( 3.765000)
duplicates_2 3.218000 0.000000 3.218000 ( 3.828000)
duplicates_3 3.250000 0.000000 3.250000 ( 3.672000)
duplicates_4 2.032000 0.047000 2.079000 ( 2.077000)
--------------------------------------- total: 11.922000sec

                user     system      total        real

duplicates_1 3.375000 0.000000 3.375000 ( 3.437000)
duplicates_2 3.188000 0.000000 3.188000 ( 3.218000)
duplicates_3 3.219000 0.015000 3.234000 ( 3.281000)
duplicates_4 1.844000 0.000000 1.844000 ( 1.859000)

C:\ruby1.9\bin>

kind regards -botp

shuaib85 · October 29, 2007, 11:29am

2007/10/28, Sean O’Halpin [email protected]:

On 10/28/07, Robert K. [email protected] wrote:

irb(main):007:0> array = %w{apple banana apple orange}
=> [“apple”, “banana”, “apple”, “orange”]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> [“apple”]

Succint ~and~ efficient!

Thanks!

Do you have a mail filter checking for any posts
containing ‘inject’?

I don’t need that since most of them were written by me. (slight
exaggeration)
chuckle

Kind regards

robert

Duplicate elements in array

array.uniq
Regards
Shuaib

=> [“apple”, “banana”, “orange”]
Shuaib

Mohit S. (with slight adjustment)

Robert K.

from facets

get some data (Ubuntu specific I guess - YMMV)

test w/o dups

create some duplicates

END
$ ruby bm-duplicates.rb

no duplicates

Mohit S. (with slight adjustment)

array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
puts ‘-’ * 40
array = File.read(’/etc/dictionaries-common/words’).split(/\n/)
END

$ ruby bm-duplicates.rb

----------------------------------------

no duplicates

----------------------------------------

user system total real

duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)

duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)

duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)

----------------------------------------

duplicates

----------------------------------------

user system total real

duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)

duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)

duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

26c31
< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|

C:\ruby1.9\bin>ruby test.rb

no duplicates

duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)

duplicates

Duplicate elements in array

array.uniq Regards Shuaib

=> [“apple”, “banana”, “orange”] Shuaib

Mohit S. (with slight adjustment)

Robert K.

from facets

get some data (Ubuntu specific I guess - YMMV)

test w/o dups

create some duplicates

END $ ruby bm-duplicates.rb

no duplicates

Mohit S. (with slight adjustment)

array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys puts ‘-’ * 40 array = File.read(’/etc/dictionaries-common/words’).split(/\n/) END

$ ruby bm-duplicates.rb

----------------------------------------

no duplicates

----------------------------------------

user system total real

duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)

duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)

duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)

----------------------------------------

duplicates

----------------------------------------

user system total real

duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)

duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)

duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

26c31 < Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|

C:\ruby1.9\bin>ruby test.rb

no duplicates

duplicates_1 7.219000 0.125000 7.344000 ( 8.109000) duplicates_2 9.844000 0.078000 9.922000 ( 10.374000) duplicates_3 14.391000 0.172000 14.563000 ( 18.498000) duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)

duplicates

array.uniq
Regards
Shuaib

=> [“apple”, “banana”, “orange”]
Shuaib

END
$ ruby bm-duplicates.rb

array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
puts ‘-’ * 40
array = File.read(’/etc/dictionaries-common/words’).split(/\n/)
END

26c31
< Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|

duplicates_1 7.219000 0.125000 7.344000 ( 8.109000)
duplicates_2 9.844000 0.078000 9.922000 ( 10.374000)
duplicates_3 14.391000 0.172000 14.563000 ( 18.498000)
duplicates_4 11.172000 0.172000 11.344000 ( 12.998000)