JRuby disabling ObjectSpace: what implications?

headius · October 29, 2007, 11:39am

2007/10/28, Charles Oliver N. [email protected]:

Robert K. wrote:

On 28.10.2007 17:19, Charles Oliver N. wrote:

You just hit on exactly why we don’t use JVMTI for ObjectSpace. It
would certainly work, but it would add a lot of overhead we’d never
expect people to accept in a real application. Plus, it would track
far more object instances than we actually want tracked.

Why is that? I mean, you could selectively decide which instances to
track.

In general, though, we haven’t explored JVMTI because we want JRuby to
be the best production environment for deploying apps, and nobody will
EVER turn on JVMTI on their production servers.

Well, it depends on the overhead and on the invocation model. I
assumed you would be starting a JVM per process but your other remarks
sound more like there is one JVM for JRuby programs…

system. “Stop the world” is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.

Ok, I see I need to dive further into JRuby before I discuss this
further.

But it may be that for cases where each_object is needed, this is a
reasonable thing to do. I think if someone were to submit an
implementation of each_object that uses JVMTI, we would certainly accept
it

Hint, hint…

able to walk the object table. And I think that’s bad especially when
we’re looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can’t lock things down like
that.

I don’t understand this remark of yours. If you implement this in Java
land (as you did apparently with WeakReferences) then there is no need
to lock anything. You just traverse the list (or a copy of the list)
and if a ref has been set to null you do not pass it to the callback.

If it is some kind of native code (possibly via JNI or other
interfaces) probably more care has to be taken, although I’d assume
that JNI takes care of this (i.e. once the callback is invoked with a
non null argument the object stays life until after the callback
returns unless you clear that reference of course).

Traversal during #each_object in that respect is similar to traversal
through an ordinary collection - during that a GC can occur just the
same but that does not affect the traversal in any way.

What am I missing?

Kind regards

robert

headius · October 29, 2007, 1:04pm

mortee wrote:

would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that’s bad especially when
we’re looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can’t lock things down like
that.

Sorry for the extremely uninitiated and naive question - but when you’re
about to enumerate each object in an application, aren’t you interested
only in this application’s objects anyway? So why would you have to lock
anything about the other ruby apps in the same process? Is that kind of
distinguishing objects impossible on the GC/enumeration level?

As far as I know there’s no way to have JVMTI enumerate only objects
created by a specific application in a given JVM. So any sort of
ObjectSpace impl based on it would have to take that into consideration.

Charlie

headius · October 29, 2007, 1:48pm

2007/10/29, Charles Oliver N. [email protected]:

The problem is not so much that the object references move as that you
distinguishing objects impossible on the GC/enumeration level?

As far as I know there’s no way to have JVMTI enumerate only objects
created by a specific application in a given JVM. So any sort of
ObjectSpace impl based on it would have to take that into consideration.

Hm, if you host different applications in the same JVM you probably
need separate class loaders anyway to separate changes on classes.
Maybe you can use that to partition the heap. Alternatively you could
use IterateOverObjectsReachableFromObject() and start from main. Just
a few wild guesses.

Btw, but the issue with stopping the world would still not go away.
Too bad. A possible solution would be to implement the callback in a
way that it places all references in a Java collection. Only after it
finishes the Ruby land callback is invoked for each instance. The
downside is that you need more space (i.e. for the collection which
could become largish) but on the plus side is that you do not have any
overhead (other than incurred by JVMTI) during “normal” operation and
you can limit the stop the world time to just the copying phase which
might be acceptable. Charles, what do you think?

Kind regards

robert

headius · October 29, 2007, 2:31pm

On Oct 28, 3:39 pm, Charles Oliver N. [email protected]
wrote:

Actually, we do that a bit already. For example, we do not track arrays

instrumentation and that can give you access to every (reachable)

system. “Stop the world” is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.

But it may be that for cases where each_object is needed, this is a
reasonable thing to do.

Exactly. I think that each_object rarely has to go into production
code, but is very handy (and, to be honest, just fun, really) in
debugging/testing/experimenting. For those type situations, I don’t
really think a “stop the world” approach is so terrible. I find it
less of a disturbance than having this off-code switch.

I think if someone were to submit an

headius · November 5, 2007, 6:53pm

Charles Oliver N. wrote:

As some of you may have heard, we’re considering disabling
ObjectSpace.each_object by default in JRuby.

I brought this up at RubyConf, and got about 50% of people saying “I
agree” and 50% of people saying “I do not agree”. As it stands now, we
will proceed with having ObjectSpace.each_object disabled by default in
JRuby 1.1 final. See the rest of this thread for the backstory and notes
on test/unit.

The folks who disagree appear to only disagree on principal, rather than
based on any real demonstrable problem with turning each_object off. On
the other hand, the folks who want to disable it have real-world
concerns: performance on the apps they’re running. Until there’s a
compelling, real-world, non-ideological reason to leave each_object
enabled by default, it will be disabled in JRuby (enable with +O flag or
jruby.objectspace.enabled=true property).

This change is already there in 1.1b1, released on Friday evening.

Charlie

headius · November 5, 2007, 7:10pm

Robert K. wrote:

Btw, but the issue with stopping the world would still not go away.
Too bad. A possible solution would be to implement the callback in a
way that it places all references in a Java collection. Only after it
finishes the Ruby land callback is invoked for each instance. The
downside is that you need more space (i.e. for the collection which
could become largish) but on the plus side is that you do not have any
overhead (other than incurred by JVMTI) during “normal” operation and
you can limit the stop the world time to just the copying phase which
might be acceptable. Charles, what do you think?

It’s certainly possible to do this, but it would probably need to create
a giant strong-referenced list of objects for iteration. Part of my hard
rules for implementing ObjectSpace is that it MUST NOT interfere with an
object’s normal lifecycle.

Charlie

headius · October 29, 2007, 1:08pm

[email protected] wrote:

I think of each_object as very much a MRI implementation feature that
the rest of us
implementors struggle to implement. Because of this, the community and
core members of
each implementation need to really beginning discussing whether or not
each_object is a
Ruby feature or an MRI feature.

That’s actually a really good point. each_object is more a feature of an
individual implementation’s memory model than a general feature that can
be applied to every Ruby implementation. In many cases, like ours, you
simply don’t have control over that memory model enough to provide a
real each_object implementation (and _id2ref requires tricks too, but
it’s at least bounded and explicit). So it may be fair to say that
each_object is an MRI feature we emulate, but cannot simulate well
enough for it to translate appropriately.

headius · November 5, 2007, 7:13pm

mortee wrote:

process-control requirements we can’t support well on JVM. But I would
also expect this use of each_object to have a “better” implementation,
and if not it could again be a specific-purpose weak hash for IO streams
(which we almost have already since we want to be able to clean them up
on exit.

Speaking of multiple cases of possible class-specific instance
tracking… isn’t it possible to register your interest in some such
class at some point explicitely from program code - and then any class
could be made enumerable.

Yes, that is possible…but it solves only part of the problem. Just
having ObjectSpace.each_object enableable through a flag allows it to be
fully functional when you want it and out of the way the rest of the
time.

Charlie