Rick DeNatale wrote:
Reference counting is pretty much an obsolete approach to GC. It was
probably the first approach taken for lisp back in the 1950s. Other
language implementations usually started with reference counting (e.g.
the first Smalltalk).
It’s main advantage is that it’s easy to understand.
I don’t think reference counting is any easier to understand than pure
mark-and-sweep or pure stop-and-copy. The main advantage of reference
counting in my opinion is that its restrictions force you to kick some
features out of your language design if you want to use it.
Mark and sweep, such as is used in the Ruby 1.8 implementation quickly
replaced reference counting as the simplest GC considered for real
use.
My recollection is that mark-and-sweep was the original, and that
reference counting came later.
More modern GCs tend to use copying GCs which move live objects to new
heap blocks leaving the dead ones behind. And most use generational
scavenging which takes advantage of the observation that most objects
either die quite young, or live a long time. This approach was
pioneered by David Ungar in the Berkeley implementation of
Smalltalk-80. And this is the kind of GC typically used in JVMs
today.
Bah … I actually found a reference a couple of days ago on this
(http://portal.acm.org/citation.cfm?id=91597). If you’re not signed up
for the ACM library it will cost you money to read it. But essentially
“pure” mark-and-sweep was replaced by stop-and-copy, which compacts the
heap. Then generational mark-and-sweep came along and “rehabilitated”
mark-and-sweep. Note the publication date – 1990. The abstract is free
– it reads:
“Stop-and-copy garbage collection has been preferred to mark-and-sweep
collection in the last decade because its collection time is
proportional to the size of reachable data and not to the memory size.
This paper compares the CPU overhead and the memory requirements of the
two collection algorithms extended with generations, and finds that
mark-and-sweep collection requires at most a small amount of additional
CPU overhead (3-6%) but, requires an average of 20% (and up to 40%) less
memory to achieve the same page fault rate. The comparison is based on
results obtained using trace-driven simulation with large Common Lisp
programs.”
Which particular GC approach is best for Ruby is subject to some study.
I think at least for Rails on Linux, someone (assuming funding) could
collect and analyze plenty of data. I’d actually be surprised if someone
isn’t doing it, although I know I’m not.
Many of the usages of ruby aren’t quite like those of Java, or
Smalltalk. I had dinner with a former colleague, who happens to be
the lead developer of the IBM J9 java virtual machine, and he made the
observation that Java, and Smalltalk before it have a long history of
having their VMs tuned for long running processes. On the other hand
many Ruby usages are get in and get out. These use cases mean that
it’s more valuable to have rapid startup than perfect GC in the sense
that all dead objects are reclaimed quickly, not that any of the
current GCs guarantee the latter.
Well … OK. If you want to distinguish between long running (server)
and rapid startup (client), that’s fine. But look at the marketplace. We
have servers, we have laptop clients, we have desktop clients, we have
mobile clients, and we have bazillions of non-user-programmable
computers like DVD players, iPods, in-vehicle navigation systems, etc.
Now while the hard-core hackers like me wouldn’t buy an iPod or a DVD
player, preferring instead to add hard drive space to a real computer,
Apple isn’t exactly going broke making iPods and iPhones that are (for
the moment, anyhow) closed to “outsiders”. And I’m guessing that, while
you can run Ruby on, say, an embedded ARM/Linux platform, most of the
software in those gizmos is written in C and heavily optimized.
I’ve got a couple of embedded toolkits, and I’ve actually built Ruby for
them, but when you only have 32 MB of RAM, you don’t want to collect
garbage – you don’t even want to generate garbage! So I wouldn’t
personally spend much time thinking about garbage collection for rapid
startup. If you want rapid startup, you’re going to have as much binding
as possible done at compile time – you aren’t even going to compile a
Ruby script to an AST when you start a process up.
So the best GC for Ruby might not be the same as would be used for a
JVM or Smalltalk VM, but I’m almost certain it would be a reference
counter.
Did you mean to say, “not be a reference counter”?