I’ve made the following observations about Ruby’s apparent
implementation:
Integer’s in the range -230 to 230-1 are Fixnum’s. Integers
outside that range are Bignum’s
For Fixnum “i”, i’s object_id is twice i’s value plus 1. Said
another way, i’s object_id is i’'s binary value left-shifted with 1 as
padding.
Object_id’s of Bignum’s close to the boundaries of Fixnum’s bear no
apparent relationship to the Fixnum object_id’s.
These observations suggest to me that an assignment statement like
“a=1” requires the interpreter to do no more than:
Search for the identifier “a” in the current scope.
If found, use it; if not, enter it in the current scope and then
use it.
“Use it” means take the binary representation of the assigned
value, left shift it with “1” padding, and make that current
object_id of the identifier “a”.
There is no need to locate free space in an memory pool where
object values for Bignums, Strings and other object are stored. For
Fixnums, the object_id IS the value, just a little shifted.
Is my speculation about this aspect of Ruby’s implementation correct,
or am I all wet?
Thanks in advance for any comments you may wish to offer,
Richard
On Dec 21, 2009, at 10:55 AM, RichardOnRails wrote:
Is my speculation about this aspect of Ruby’s implementation correct,
or am I all wet?
Generally correct. You might want to look at http://en.wikipedia.org/wiki/Tagged_pointer for a discussion of this
overall technique. This technique is one way around ‘boxing’ primitive
types such as small integers, boolean values, nil, and symbols.
For Fixnum “i”, i’s object_id is twice i’s value plus 1. Said
another way, i’s object_id is i’s binary value left-shifted with
1 as padding.
Right, Fixnums are immediate in Ruby (at least in MRI).
Object_id’s of Bignum’s close to the boundaries of Fixnum’s
bear no apparent relationship to the Fixnum object_id’s.
Right, because Bignums are ‘regular’ objects.
For Fixnums, the object_id IS the value, just a little shifted.
Spot on.
There was a very nice talk about these things; you might want to google
for “Understanding Ruby’s Object Model” (the top links do not open for
me at the moment, so I’m not sure this is the talk I think of).
I’ve made the following observations about Ruby’s apparent
implementation:
Integer’s in the range -230 to 230-1 are Fixnum’s. Integers
outside that range are Bignum’s
For Fixnum “i”, i’s object_id is twice i’s value plus 1. Said
another way, i’s object_id is i’'s binary value left-shifted with 1 as
padding.
Object_id’s of Bignum’s close to the boundaries of Fixnum’s bear no
apparent relationship to the Fixnum object_id’s.
All correct.
Fixnums, the object_id IS the value, just a little shifted.
Is my speculation about this aspect of Ruby’s implementation correct,
or am I all wet?
Hm… While you have many things right, I don’t fully agree. No
variable (be it a local variable like “a” in your example or an
instance variable like “@a”) has an object id. Object id is a
property of an object which might be referenced by many variables.
Your step 3 actually mixes two separate things: evaluation of the
expression of the right side of “=” and assignment. “Using” in an
assignments means to take whatever the expression spits out and store
it in the variable.
Now, what your right hand expression yields is an object reference.
For optimization purposes some object references are special in that
they actually are the object (these are the “immediate values” which
Gary mentioned, Fixnums for example). This does not make special
treatment for assignments necessary. Rather, it makes special
treatment for method calls necessary. Because then the interpreter
does not have to look up the object on the heap etc.
Someone with more intimate knowledge of the implementation might be
able to explain this better. But I believe it’s important to point
out that the difference is not in the assignment but in the right hand
side expression which - in the case of a Fixnum - yields a special
object reference.
On Dec 21, 2009, at 10:55 AM, RichardOnRails wrote:
Is my speculation about this aspect of Ruby’s implementation correct,
or am I all wet?
Generally correct. You might want to look at http://en.wikipedia.org/wiki/Tagged_pointer for a discussion of this overall technique. This technique is one way around ‘boxing’ primitive types such as small integers, boolean values, nil, and symbols.
Gary W.
Hi Gary,
Generally correct.
Thanks for encouragement.
Tagged_pointer
Thanks for the reference. I hadn’t heard of the term before.
Object_id’s of Bignum’s close to the boundaries of Fixnum’s bear no
value, left shift it with “1” padding, and make that current
instance variable like “@a”) has an object id. Object id is a
treatment for assignments necessary. Rather, it makes special
robert
–
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/
Hi Robert,
No variable … has an object id.
Robert, I know, by virtue of reading a number of your responses in
this NG, that you’re a Ruby expert. But the statement referenced
above is contradicted by at least two authorities on Ruby:
“… every object has a unique object identifier (abbreviated as
object ID)” [“Programming Ruby”, Second Edition, Thomas, et al, p.
12] , where you praised the book’s usefulness :-). Of course, the
“id” method has since be deprecated to “object_id” when we want to
reference an object ID.
“… an object in Ruby has an identity: “abc”object_id #53744407
This object ID is of limited utility." [“The Ruby Way”, Second
Edition, Fulton, p.26]
step 3 actually mixes two separate things: evaluation of the expression of the right side of “=” and assignment
Granted, I stuck to what I viewed as the essential issues. Perhaps a
more complete estimate might read.
3.1 Ruby interprets “a = 1” to mean “a.=(1)”, i.e. invokes the “=”
method on the object. Normally, that reference will cascade up to
Object.=, I suppose.
3.1a If “a” is undefined in the current scope, “a” will be added to
the scope with an ID of (2a+1) = 3.
3.1b Otherwise:
3.1.b.1 If a’s ID is in the range of Fixnums, (or a’s ID indicates
it’s true, false, etc .), a’s ID will be set to 3
3.1b.2 Otherwise, if a’s ID indicates any other type, a’s ID will be
pushed to the garbage stack and a’s ID will be set to 3
Is that better?
For optimization purposes some object references are special in that they actually are the object
That sounds like you mean in the case of Fixnum “1”, that the number
1 IS the ID as well as the value of the object. But we can see “puts
1.object_id” => 3
So I think that fact (plus all the other evidence I offered) suggests
that ID’s for Fixnum’s are manufactured as 2*value+1
QED???
As always, thanks for your insightful response.
Richard
Object_id’s of Bignum’s close to the boundaries of Fixnum’s bear no
value, �left shift it with “1” padding, and make that current
instance variable like “@a”) has an object id. �Object id is a
treatment for assignments necessary. �Rather, it makes special
robert
–
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/
Hi Robert,
No variable … has an object id.
Robert, I know, by virtue of reading a number of your responses in
this NG, that you�re a Ruby expert. But the statement referenced
above is contradicted by at least two authorities on Ruby:
�… every object has a unique object identifier
Right. The object has an object_id. The variable that contains it
does not. The difference is subtle but important.
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/
Right. The object has an object_id. The variable that contains it
Best,
–
Marnen Laibow-Koserhttp://www.marnen.org [email protected]
–
Posted viahttp://www.ruby-forum.com/.
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/
Right. �The object has an object_id. �The variable that contains it
Best,
–
Marnen Laibow-Koserhttp://www.marnen.org [email protected]
–
Posted viahttp://www.ruby-forum.com/.
Hi Marnen,
Thanks for that confirmation.
Just to make it perfectly clear: I’m confirming that Robert is right,
and that there’s no contradiction with the other articles you found. I
am not confirming what I take to be your hypothesis.
“… an object in Ruby has an identity: “abc”object_id #53744407
This object ID is of limited utility." [“The Ruby Way”, Second
Edition, Fulton, p.26]
As Marnen pointed out already, there is no contradiction between what
I said and what the authorities said. A variable is just a place to
store an object reference. Other than that it does not have
properties, definitively no id. What is returned by invocation of #object_id is attached to the object at hand and not to the variable.
Whether the object is an immediate value such as a Fixnum or a
“regular” object (such as String) does not really matter with regard
to the variable.
step 3 actually mixes two separate things: evaluation of the expression of the right side of “=” and assignment
Granted, I stuck to what I viewed as the essential issues. Perhaps a
more complete estimate might read.
3.1 Ruby interprets “a = 1” to mean “a.=(1)”, i.e. invokes the “=”
method on the object. Normally, that reference will cascade up to
Object.=, I suppose.
Now that is completely wrong. “a=1” is an assignment meaning, it
takes whatever object reference the evaluation of the expression on
the right side yields and puts it into the storage location denoted by
“a”. There is no method invoked, certainly no method on “a” which -
as I said - is no object but just a place in storage.
3.1a If “a” is undefined in the current scope, “a” will be added to
the scope with an ID of (2a+1) = 3.
3.1b Otherwise:
3.1.b.1 If a’s ID is in the range of Fixnums, (or a’s ID indicates
it’s true, false, etc .), a’s ID will be set to 3
3.1b.2 Otherwise, if a’s ID indicates any other type, a’s ID will be
pushed to the garbage stack and a’s ID will be set to 3
Is that better?
I’m sorry, no. It’s rather:
Make sure local scope has a place for variable “a”.
Evaluate the expression to the right side of “=”.
Take the resulting object reference of step 2 and store it in the
location for variable “a”.
Steps 1 and 2 can actually be exchanged, it does not really matter much.
For optimization purposes some object references are special in that they actually are the object
That sounds like you mean in the case of Fixnum “1”, that the number
1 IS the ID as well as the value of the object. But we can see “puts
1.object_id” => 3
No, it means that in the case of Fixnum the reference is the object.
See the article Gary referenced for further detail:
So I think that fact (plus all the other evidence I offered) suggests
that ID’s for Fixnum’s are manufactured as 2*value+1
QED???
The way object ids come into existence is completely unrelated to the
process of variable assignment. Since in Ruby any object needs an id
and Fixnums (plus a few more) are immediate values (which means there
is just an object reference and no object, which btw also means that
there is no place to store state and hence Fixnums are immutable) it
was chosen to derive object ids according to the formula you found
out.
There is probably also a technical reason for this formula which
likely has to do with the fact that other objects need ids as well and
calculation of them should be efficient but I believe it is more
important to understand the object - object reference dichotomy.
Actually, I believe all this reasoning about Fixnums being immediate
values is totally overdone. From a Ruby programmer’s perspective it
is completely irrelevant (if you put performance aside for the
moment). It is sufficient to know that Ruby has variables which
contain object references and that evaluation of whatever expression
yields an object reference. The programming model is as simple as
that. Immediate values are really just an optimization under the hood
to speed up math and other common operations. In Ruby land, you have
no chance to distinguish an immediate value from any other immutable
object - whatever methods you invoke (and there are quite a few for
Fixnum) the object simply does not change its state. Granted, there
are a few things that do not work, for example defining a finalizer
for a Fixnum but even that raises an ordinary exception.
No, it means that in the case of Fixnum the reference is the object.
A very good way to express this I think.
Thank you!
integer.
But this doesn’t need to be the case, in an implementation which used
a different object model for, say GC, and which interposed an
indirection, then boxed objects might have an object id field kept
with the indirection, so that the object state could be moved without
affecting the object_id.
Other Ruby implementations like JRuby, Rubinius, Maglev … might
implement such stuff in a way similar to MRI, possible with subtle
variations, or some might do it radically differently.
Absolutely. Again, when in Ruby land it does not really matter how a
particular implementation does it as long as the contract stays
roughly the same (i.e. #object_id returns something integerish).
Fixnum) the object simply does not change its state. Granted, there
are a few things that do not work, for example defining a finalizer
for a Fixnum but even that raises an ordinary exception.
Agreed, the only time it’s really important is edge-cases, and when
writing extensions.
When writing extensions we’re leaving Ruby land and so, yes, chances
are that you better know how things work under the hood then. Do you
have any particular edge cases in Ruby land in mind? Off the top of
my head I cannot think of any.
“a”. There is no method invoked, certainly no method on “a” which -
as I said - is no object but just a place in storage.
Absolutely correct!
1 IS the ID as well as the value of the object. But we can see “puts
1.object_id” => 3
No, it means that in the case of Fixnum the reference is the object.
A very good way to express this I think.
out.
There is probably also a technical reason for this formula which
likely has to do with the fact that other objects need ids as well and
calculation of them should be efficient but I believe it is more
important to understand the object - object reference dichotomy.
These things are artifacts of the implementation of the language. I
believe that in MRI the object_id is just the reference value ‘cast’
to a FixNum (or maybe Integer). So for an immediate object it’s a
particular bit pattern interpreted as an integer, and for boxed
objects it’s the address of the object’s state interpreted as an
integer.
But this doesn’t need to be the case, in an implementation which used
a different object model for, say GC, and which interposed an
indirection, then boxed objects might have an object id field kept
with the indirection, so that the object state could be moved without
affecting the object_id.
Other Ruby implementations like JRuby, Rubinius, Maglev … might
implement such stuff in a way similar to MRI, possible with subtle
variations, or some might do it radically differently.
Fixnum) the object simply does not change its state. Granted, there
are a few things that do not work, for example defining a finalizer
for a Fixnum but even that raises an ordinary exception.
Agreed, the only time it’s really important is edge-cases, and when
writing extensions.
That sounds like you mean in the case of Fixnum “1”, that the number
integer.
contain object references and that evaluation of whatever expression
writing extensions.
–
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/
Hi Robert and Rick,
Thanks for honoring me with responses to what you guys must view as
drivel. And I accept the assertion that Ruby recognizes “a=1” as an
assignment statement rather than an invoking a apparently mythical “=”
operator.
Actually, I believe all this reasoning about Fixnums being immediate
values is totally overdone. From a Ruby programmer’s perspective it
is completely irrelevant (if you put performance aside for the
moment). [snip]
Agreed.
I see your point. But I’ve got a “burr under my intellectual saddle.”
Both Robert and Matz responded a few weeks ago on a question I raised
about implementing a Fixnum ++ operator in Ruby functionally
equivalent to C’s prefix_++ operator. I’m not lobbying for such a
change; I merely suggested that it’s possible. The response was such
an operator would be functionally equivalent to “turning 1 into 2.”
But I see the following:
a = 1 # ID = 3
a = 2 # ID = 5
b = 1 # ID = 1
b++ # implemented as b.++() which increments b’s object’s ID by 2
and checks for edge cases; I don’t see such an implementation as
disturbing any other object, at least none on my machine’s hardware
and software state.
“a=1” is an assignment meaning, it
takes whatever object reference the evaluation of the expression on
the right side yields and puts it into the storage location denoted by
“a”. There is no method invoked, certainly no method on “a” which -
as I said - is no object but just a place in storage.
But aside from copying (in this case) 1’s ID, Ruby must first look up
a’s
presence in the current scope’s portion of the symbol table and if not
present, insert it, or otherwise deal with the existing ID for the “a”
entry: (i) if the ID is an actual address of data in some pool,
decrement its reference count if positive; or (ii) do nothing for
Fixnums, true, false and nil).
Looking at at actual IDs used on my 1.8.6.x version of Ruby on my
Windows XP machine, I see these two cases are easily disambiguated:
(i) ID is an even number greater than 4, perhaps (to account for
false, true, nil, which are 0,2, 4. respectively) and (ii) everything
else.
I wouldn’t even have posted my ++ question if I could follow my way
through Ruby’s C or C++ implementation. But in about my last decade
before retirement, I was the debugger/enhancer of several
organization’s apps, so I know daunting the hunt through C/C++ code
is, especially an app as big as Ruby. But if you pointed me to Ruby’s
Finite State Machine (there’s GOT to be one), I’d …
With respect to you both, as well as thanks,
Richard
Thanks for honoring me with responses to what you guys must view as
drivel. And I accept the assertion that Ruby recognizes “a=1” as an
assignment statement rather than an invoking a apparently mythical “=”
operator.
On syntax level it’s an operator as many others but underneath it is
quite different. If you think about it for a moment this is the only
operation which can change an object’s state for all non built in
classes.
Actually, I believe all this reasoning about Fixnums being immediate
values is totally overdone. From a Ruby programmer’s perspective it
is completely irrelevant (if you put performance aside for the
moment). [snip]
Agreed.
I see your point. But I’ve got a “burr under my intellectual saddle.”
That must hurt…
b++ # implemented as b.++() which increments b’s object’s ID by 2
and checks for edge cases;
Please forget your idea of object ids being modified. An object id is
just a derived bit of data. And, btw, object ids never change
because the identity of an object does not change.
I don’t see such an implementation as
disturbing any other object, at least none on my machine’s hardware
and software state.
You are probably overlooking effects on consistency of the whole
language. While it would be technically doable to introduce ++ into
the language the consequences are undesirable.
prefix ++ operator would change the state of an instance if you
wanted to roughly retain C++ semantics.
This means, that for other (non Fixnum) classes this will happen:
a = anything_but_Fixnum()
b = a
++a
b == a # => true
b.equal? a # => true, identity does not change
But, if what you wrote were followed (no other object is affected):
a = any_Fixnum()
b = a
++a
b == a # => false
b.equal? a # => false, identity has changed
Also, if Fixnums should behave like other classes, suddenly a
Fixnum becomes mutable, which cannot be since Fixnums are immediate
and do not have a place to store state that can change.
The only salvation is to make “++a” syntactic sugar for “a += 1” or
rather “a = a + 1” (or maybe even “a = a.succ”), i.e. an expression
which contains an assignment. Then “b == a” under 2. above would
return “false” as well and everything is consistent again (assuming
standard implementations of + which do not change the instance but
return a new one).
I believe Matz decided against because a) there is a pretty short
idiom which does this already (“a += 1”) and b) that existing idiom
has an equals sign in there thus making the assignment obvious; “++a”
on the contrary does not have an equals sign anywhere and thus would
obscure the fact that under the hood an assignment is taking place.
This would make the code harder to read while not bringing any
benefits to the language.
“a=1” is an assignment meaning, it
takes whatever object reference the evaluation of the expression on
the right side yields and puts it into the storage location denoted by
“a”. There is no method invoked, certainly no method on “a” which -
as I said - is no object but just a place in storage.
But aside from copying (in this case) 1’s ID, Ruby must first look up
Not “1’s ID” is copied but the “reference to 1”. “1” is only special
in that the reference does not point to an actual object.
a’s
presence in the current scope’s portion of the symbol table and if not
present, insert it, or otherwise deal with the existing ID for the “a”
entry: (i) if the ID is an actual address of data in some pool,
decrement its reference count if positive; or (ii) do nothing for
Fixnums, true, false and nil).
AFAIK Ruby has a mark and sweep garbage collector which does not
employ reference counting.
I wouldn’t even have posted my ++ question if I could follow my way
through Ruby’s C or C++ implementation. But in about my last decade
before retirement, I was the debugger/enhancer of several
organization’s apps, so I know daunting the hunt through C/C++ code
is, especially an app as big as Ruby. But if you pointed me to Ruby’s
Finite State Machine (there’s GOT to be one), I’d …
I would add than using i += 1 is often doing it the wrong way in Ruby. We
use Enumerable most of the time, and try to not create any index (because
that’s awful, you need an initialization(i=0), a step to go forward(i+=1),
and you keep a useless index at the end).
Note sure about your “often” - iterating in the majority of cases can
certainly be done without indexing an Array. There are other uses for
“a += 1” though - such as counting.
I believe Matz decided against because a) there is a pretty short
idiom which does this already (“a += 1”) and b) that existing idiom
has an equals sign in there thus making the assignment obvious; “++a”
on the contrary does not have an equals sign anywhere and thus would
obscure the fact that under the hood an assignment is taking place.
This would make the code harder to read while not bringing any
benefits to the language.
I would add than using i += 1 is often doing it the wrong way in Ruby.
We
use Enumerable most of the time, and try to not create any index
(because
that’s awful, you need an initialization(i=0), a step to go
forward(i+=1),
and you keep a useless index at the end).
If you really need an index, you can always use #upto , #downto or #step.
Every time you need an index (that means the index of the Enumerable
thing
you’re iterating is meaningful), you should use #each_with_index or #with_index.
I see your point. But I’ve got a “burr under my intellectual saddle.”
That must hurt…
Nah It’s just that sometimes, to use a different metaphor, I’m a
dog gnawing at a technology bone. But I’ve got to suspend this issue
for about 10 days while I concentrate on my brother’s family (coming
in from Europe) and my son’s (coming in from the deep South).
I’ll be staying in Europe and keep watching out - your brother can
safely travel.
But, as a Schwarzenegger character once promised, I’ll be back :-).
I feel seriously threatened.
Best wishes for the holidays,
To you, your family and all other members of the Ruby community, too.
assignment statement rather than an invoking a apparently mythical “=”
moment). [snip]
an operator would be functionally equivalent to “turning 1 into 2.”
because the identity of an object does not change.
wanted to roughly retain C++ semantics.
The only salvation is to make “++a” syntactic sugar for “a += 1” or
obscure the fact that under the hood an assignment is taking place.
AFAIK Ruby has a mark and sweep garbage collector which does not
Cheers
robert
–
remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/
Thanks for your additional response.
I see your point. But I’ve got a “burr under my intellectual saddle.”
That must hurt…
Nah It’s just that sometimes, to use a different metaphor, I’m a
dog gnawing at a technology bone. But I’ve got to suspend this issue
for about 10 days while I concentrate on my brother’s family (coming
in from Europe) and my son’s (coming in from the deep South).
But, as a Schwarzenegger character once promised, I’ll be back :-).