Deep cloning, how?

I am trying to figure out how to perform a deep clone

class A
attr_accessor :name
end

a1 = A.new
a1.name = “yoyoma”
a2 = a1.dup
a1.name.chop!
puts a2.name

I found the following way to write a deep clone method

class A
attr_accessor :name
def dup
Marshal::load(Marshal.dump(self))
end
end

If I wanted to write my own specialize deep_cloner, how would I do this.
If I
try the obvious way to do it,

def dup
@name = self.name.dup
end

I get an error when ‘puts a2.name’ is executed saying:

NoMethodError: undefined method `name’ for “yoyom”:String

Can someone explain what’s going on when dup is called. What gets passes
to dup
(i assume self) and why my code is wrong?


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

Rajinder Y. wrote:

Can someone explain what’s going on when dup is called. What gets passes
to dup
(i assume self) and why my code is wrong?

dup is a method of your existing object, and should return the new
object instance.

class A
attr_accessor :name
def dup
res = self.class.new
res.name = name.dup
res
end
end

a1 = A.new
a1.name = “yoyoma”
a2 = a1.dup
a1.name.chop!
puts a2.name

There is a subtle distinction between ‘dup’ and ‘clone’ which I’ll leave
someone else to explain…

2009/10/14 Rajinder Y. [email protected]:

a1.name.chop!
end

If I wanted to write my own specialize deep_cloner, how would I do this. If
I try the obvious way to do it,

def dup
@name = self.name.dup
end

No, this is by far not the obvious way since you would at least have
to make sure there is a copy of self and this is returned from dup. I
would rather do

def dup
copy = super
copy.name = @name.dup
copy
end

or even

def dup
self.class.new(@name.dup)
end

Although that approach is fragile depending on the code in #initialize.

I get an error when ‘puts a2.name’ is executed saying:

NoMethodError: undefined method `name’ for “yoyom”:String

Can someone explain what’s going on when dup is called. What gets passes to
dup (i assume self) and why my code is wrong?

Hope the above sheds some light.

Kind regards

robert

On 10/14/09, Robert K. [email protected] wrote:

a2 = a1.dup
end
No, this is by far not the obvious way since you would at least have

def dup
self.class.new(@name.dup)
end

Although that approach is fragile depending on the code in #initialize.

The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead. I’m not
sure why this should be or why overriding dup would be bad. But
anyway, I would suggest this:

def deep_clone
copy=clone
[email protected]
copy
end

just so you can keep the existing semantics of clone as a shallow copy.

I’m really not sure why there are 2 methods to create shallow copies
in ruby and what all the differences are supposed to be. Other than
not overriding dup(?), the only other difference between them that I
can discover is that clone copies the metaclass of the object, whereas
dup reverts the copy’s metaclass to being just its class. I’ve been
wondering about the difference between the 2 recently; I hope someone
out there can provide some enlightenment on why there are 2 and what
the differences are.

On Wed, Oct 14, 2009 at 7:30 PM, Robert K.
[email protected] wrote:

I’m not

class Object - RDoc Documentation
class Object - RDoc Documentation

Also IIRC dup does not copy singleton methods and clone does.


Paul S.
http://www.nomadicfun.co.uk

[email protected]

On 14.10.2009 19:03, Caleb C. wrote:

The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead.

Where do you take that from? In the docs referenced below I cannot find
anything like that. The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.

just so you can keep the existing semantics of clone as a shallow copy.

I’m really not sure why there are 2 methods to create shallow copies
in ruby and what all the differences are supposed to be. Other than
not overriding dup(?), the only other difference between them that I
can discover is that clone copies the metaclass of the object, whereas
dup reverts the copy’s metaclass to being just its class. I’ve been
wondering about the difference between the 2 recently; I hope someone
out there can provide some enlightenment on why there are 2 and what
the differences are.

There are more differences namely in the area of frozen and tainted
state.

http://www.ruby-doc.org/core/classes/Object.html#M000351
http://www.ruby-doc.org/core/classes/Object.html#M000352

Kind regards

robert

Caleb C. wrote:

a1.name = “yoyoma”
Marshal::load(Marshal.dump(self))
No, this is by far not the obvious way since you would at least have

def deep_clone
copy=clone
[email protected]
copy
end

cool, do a shallow clone and then do a specialized deep cloning, i like
this! I
did not get as far as you did about dup and clone and not to redefine
dup, this
is news to me but something worth looking into.


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

Brian C. wrote:

def dup
res = self.class.new
res.name = name.dup
res
end
end

Brain, I am starting to see where I went wrong, this clears it up,
thanks!

a1 = A.new
a1.name = “yoyoma”
a2 = a1.dup
a1.name.chop!
puts a2.name

There is a subtle distinction between ‘dup’ and ‘clone’ which I’ll leave
someone else to explain…


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

Robert K. wrote:

I’m not

class Object - RDoc Documentation
class Object - RDoc Documentation

Thanks for the links and solutions Robert. This one got some good
replies.

Kind regards

robert


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

On 10/15/2009 08:07 PM, Caleb C. wrote:

I’m looking at these 2 sentences:
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

Mee, too. :slight_smile: Just for the reference

irb(main):019:0> class X
irb(main):020:1> def initialize_copy(*a) p [self,a] end
irb(main):021:1> end
=> nil
irb(main):022:0> x=X.new
=> #<X:0x8670108>
irb(main):023:0> x.dup
[#<X:0x865d9cc>, [#<X:0x8670108>]]
=> #<X:0x865d9cc>
irb(main):024:0> x.clone
[#<X:0x864ddb0>, [#<X:0x8670108>]]
=> #<X:0x864ddb0>

And it’s noteworthy that frozen state is only established after
initialize_copy has returned:

irb(main):025:0> class X
irb(main):026:1> def initialize_copy(old)
irb(main):027:2> @x=1
irb(main):028:2> end
irb(main):029:1> end
=> nil
irb(main):030:0> X.new.freeze.clone
=> #<X:0x86451b0 @x=1>

There are more differences namely in the area of frozen and tainted state.

class Object - RDoc Documentation
class Object - RDoc Documentation

Ah, yes. But only frozen state, not tainted.

Right you are.

Kind regards

robert

Robert K. wrote:

initialize_copy apparently is used by both dup and clone. You’re
=> #<X:0x8670108>
irb(main):023:0> x.dup
[#<X:0x865d9cc>, [#<X:0x8670108>]]
=> #<X:0x865d9cc>
irb(main):024:0> x.clone
[#<X:0x864ddb0>, [#<X:0x8670108>]]
=> #<X:0x864ddb0>

And it’s noteworthy that frozen state is only established after
initialize_copy has returned:

Celeb, Robert thanks for bringing this point home. I keep having to
update my
Ruby notes =) … let me try the initialize_copy way!

state.
robert


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

On 10/14/09, Robert K. [email protected] wrote:

On 14.10.2009 19:03, Caleb C. wrote:

The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead.

Where do you take that from? In the docs referenced below I cannot find
anything like that. The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.

I’m looking at these 2 sentences:

 In general, +clone+ and +dup+ may have different
 semantics in descendent classes. While +clone+ is used to duplicate
 an object, including its internal state, +dup+ typically uses the
 class of the descendent object to create the new instance.

Frankly, I’ve never been real sure what this is supposed to mean, so
my reading may well be wrong. In fact, it probably is.

initialize_copy apparently is used by both dup and clone. You’re
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

There are more differences namely in the area of frozen and tainted state.

class Object - RDoc Documentation
class Object - RDoc Documentation

Ah, yes. But only frozen state, not tainted.

Caleb C. wrote:0

 In general, +clone+ and +dup+ may have different
 semantics in descendent classes. While +clone+ is used to duplicate
 an object, including its internal state, +dup+ typically uses the
 class of the descendent object to create the new instance.

Frankly, I’ve never been real sure what this is supposed to mean

I think what it means is: clone just copies all the instance
variables, whilst dup calls self.class.new().

It’s quite common for initialize() to have all sorts of side effects,
creating new objects and so on. So you can expect dup to do all this,
whilst you can expect clone to create an identical object with all the
instance variables pointing at the same objects.

Nothing enforces that of course, so it’s just a convention.

The only real differences I can see are:

  • clone also copies the frozen state of the object
  • clone makes a copy of the singleton class

(whereas in dup, by default the newly-created object has an empty
singleton class; it’s assumed that if there are any methods to be added
to that, your own dup method will do that for you, possibly with the
assistance of your initialize method)

initialize_copy apparently is used by both dup and clone. You’re
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

I’m not sure I agree with that. The default implementation of both dup
and clone does this, as it’s the only reasonable thing for Object to do
without any knowledge of its subclasses. But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Since I never use clone, it’s a moot point for me as to what it should
do in a subclass.

Regards,

Brian.

But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

I’d understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).
Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don’t quite understand why
there are two methods.

2009/10/16 lith [email protected]:

But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.

I’d understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).

Also for shallow copy in order to avoid aliasing! IMHO a proper setup
looks like this:

class A
attr_reader :x
attr_accessor :y

def initialize
@x = []
@y = 10
end

def initialize_copy(source)
super
# p self
@x = source.x.dup
end
end

class B < A
attr_accessor :z

def initialize
super()
@z = {}
end

def initialize_copy(source)
super
@z = source.z.dup
end
end

Note that the copy is initialized with the same set of references when
entering #initialize_copy so you need only deal with members that
could cause aliasing issues (unfrozen strings and collections for
example).

Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don’t quite understand why
there are two methods.

You cannot reset frozen state - for good reasons.

Kind regards

robert

On 10/16/09, Brian C. [email protected] wrote:

It’s quite common for initialize() to have all sorts of side effects,
creating new objects and so on. So you can expect dup to do all this,
whilst you can expect clone to create an identical object with all the
instance variables pointing at the same objects.

Object#dup does not call new; I think it’s more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

My reading of those 2 sentences I quoted has now changed. Now I
believe that all it’s saying is that clone copies the singleton class
whereas dup reverts the copy to the object’s original class. Tho I
still don’t fully understand what ‘internal state’ is supposed to
mean. Are instance variables not part of the internal state? Yet both
dup and clone copy them.

Since I never use clone, it’s a moot point for me as to what it should
do in a subclass.

I never used to use clone either, til I discovered a case where I
needed to copy the singleton class. Now I’m of the opinion that one
should default to clone when a copy is needed, and fall back to dup
only when clone is unsuitable.

lith wrote:

that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don’t quite understand why
there are two methods.

This is exactly the approach I am now following after the various
discussions
and insights. I just leave dup and clone to keep their default
behavior
intact, so they both end up calling initialize_copy and you don’t get
some
bizarre Frankenstein clone if you were to redefine dup or clone. I am
thinking
about someone else using my code and what will cause less headache for
them in
the end.


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

Robert K. wrote:

I’d understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).

Also for shallow copy in order to avoid aliasing! IMHO a proper setup
looks like this:

Robert, I like this setup, thanks for the sample code to look over, just
discovered why adding ‘super’ is important, which was missing from my
notes and
an oversight on my part.

It is sufficient to call ‘super’ and not ‘super source’? if you are
passing
stuff up the hierarchy construction chain.

I am going to conjecture ‘super’ ends up becoming ‘super self’, which
make sense
because the parent constructor don’t care about sub class data members.
Does
that make any sense to you?

super
@z = {}

could cause aliasing issues (unfrozen strings and collections for

robert


Kind Regards,
Rajinder Y.

http://DevMentor.org
Do Good ~ Share Freely

On 10/16/2009 09:45 PM, Rajinder Y. wrote:

I’d understand the description in such a way that user should

It is sufficient to call ‘super’ and not ‘super source’? if you are passing
stuff up the hierarchy construction chain.

You seem to be mixing two things: super in #initialize and
#initialize_copy. In #initialize_copy you can simply write “super”
(without brackets) because that will make sure the argument list is
propagated. You can do this because #initialize_copy will always only
have one argument, the object that was duped / cloned.

In the constructor I explicitly wrote “super()” because the super class
#initialize does not have arguments and “super” will break as soon as
you add parameters to the sub class constructor. Of course, if you
change both classes in parallel you can stick with “super”.

I am going to conjecture ‘super’ ends up becoming ‘super self’, which make sense
because the parent constructor don’t care about sub class data members. Does
that make any sense to you?

No. Neither for #initialize nor for #initialize_copy you want self as
argument to super.

Kind regards

robert

And after another experiment or two, it would appear that the depth of
the copy produced by either dup or clone is the same and depends
entirely on what intialize_copy does.


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale