Should *most* memory be release back to the system?

If anyone can explain this I would appreciate it.

I have an IT group knocking Ruby saying that it never releases memory
back to the system (to be available to procs other than Ruby) so
feeling somewhat defensive I went and wrote a dumb script that goes
back and forth between two methods, one cat’ing huge strings and the
other parsing an xml doc with Hpricot.

“task” tops off (in top) around 50M and “other_task” peaks around 150M
(on my machine, CentOS, lastest stable 1.8.6) but when we return to
running “task” for extended periods, memory usage remains at ~150M.

Forgive my ignorance. Can anyone explain this behavior or at least
point me to a place to educate myself?

Many thanks, the script I’m running is below.


#!/usr/bin/env ruby
require ‘rubygems’
require ‘hpricot’

def other_task
a = []
9999.times do
a << “12345678901234567890123456789012345678901234567890” * 100
end
nil
end

def task

500K xml data

data = File.readlines(“very_large_output.xml”).to_s
temp = Hpricot.XML data
nil
end

puts “In task”
10.times {|i| task; p i}
puts “In other task”
100.times {|i| other_task; p i}
puts “In task (Should memory go down?)”
100.times {|i| task; p i}

Blackie [email protected] writes:

If anyone can explain this I would appreciate it.

It’s your OS (I meant kernel and libc).

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/160726
and
http://www.crowdedweb.com/articles/2006/03/22/ruby-on-rails-and-application-memory-consumption-patterns.

The second link is dead, but there may be cached copies around.

The first one shows that on some OS, if there is a hole in your memory
usage, that hole will not be returned by your libc to the kernel or
the kernel won’t reclaim it. Either way, your OS is not taking it
back.

The second one (if you ever find a copy) adds some more observations
and also a handy tweak for FreeBSD that cause the OS to take back
holes (IIRC).

But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

YS.

I do appreciate your help (and no that link is lost to the mists of
time as far as I can google.) :slight_smile:

I do understand that they are not useful for leak detection, but this
is just observing the peak during the life of the script. During
“other_task” I can see the usage rise and fall so I know the OS is
reclaiming some memory in places…but the part I don’t understand
is why it isn’t returning to “tasks” original base of 50M when only
“task” is running.

I hate to sound dense, but I need to convince some fairly hard-headed
sysadmins.

On Oct 11, 8:40 am, Yohanes S. <[email protected]

2007/10/11, Blackie [email protected]:

I hate to sound dense, but I need to convince some fairly hard-headed
sysadmins.

:slight_smile:

There are a few things to say to this.

First, it seems reasonable to hold on to memory that has once been
grabbed because you can expect that your process needs that memory
again. There is no point in allocating and deallocation memory from
the OS all the time.

Then, even though memory is allocated does not mean that physical
memory is actually used. IIRC pure allocation just reserves memory,
only when you try to access it the first time the OS generates a page
fault and you get physical memory. If memory is unused for a while
chances are that it’s paged out to disk if other processes need
resources. If not, there is no problem anyway. Granted, if a machine
does not have enough virtual memory configured this can lead to
problems for long running programs but then I’d say the machine is
probably misconfigured anyway.

Now, what was the third point I had in mind? Ah yes: I believe
typically JVM’s behave the same. However, Sun’s JVM also does heave
copying around which might help the OS because objects will be packed
on fewer memory pages so that more pages are idle and can be swapped
out. I believe the current Ruby interpreter does not copy objects but
you can verify this by looking at the source code.

All in all, this is a strange reason to ban a programming language
from a machine IMHO. Other reasons seem more reasonable to me
(management overhead for keeping the installation up to date etc.).

Kind regards

robert

On Oct 11, 2007, at 6:40 AM, Yohanes S. wrote:

But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

YS.

sorry, for jumping in, but i’d love your opinion on this yohanes:

http://drawohara.tumblr.com/post/14421265

kind regards.

a @ http://codeforpeople.com/

On Thu, 11 Oct 2007 22:15:04 +0900, Blackie [email protected]
wrote:

During “other_task” I can see the usage rise and fall so I know the
OS is reclaiming some memory in places…but the part I don’t
understand is why it isn’t returning to “tasks” original base of 50M
when only “task” is running.

This is not actually a problem specific to Ruby, but applies to the
majority of software which does not use a compacting allocator (this
includes C, with malloc/free[1]).

Typically, memory is allocated in blocks from the OS, and parceled out
to individual objects from there. When there are no more active objects
in a block, the block could theoretically be returned to the OS. In
practice that can be hard to do, since individual “stragglers” can
prevent entire blocks from being reclaimed. The block can still be
reused within the program to allocate new objects, but it may not be
possible to return to the OS while the program is still running[2].

Ruby does make a reasonable effort to return unused blocks (“heaps” in
the parlance of gc.c) to the OS[3], when it is possible to do so. But
it is not always possible to do so.

I hate to sound dense, but I need to convince some fairly hard-headed
sysadmins.

Do they use Perl? Perl 5 does not even try to return memory to the OS.

-mental

[1] malloc/free can be even worse, if implemented using brk/sbrk,
since a single live object at a high address can “pin” the entire
rest of the heap below it, not just a single block

[2] a process’ memory is always reclaimed by the OS once it exits

[3] see free_unused_heaps in Ruby’s gc.c

Yohanes S. wrote:

But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

Do you mean not accurate as (1) or (2):

(1) an estimate of how much memory is used by the interpreter for
objects, program, etc.

or

(2) an estimate of how much memory the kernel has allocated to the
process.

On Fri, 12 Oct 2007 03:25:52 +0900, Sylvain J.
[email protected] wrote:

Ruby does make a reasonable effort to return unused blocks (“heaps” in
the parlance of gc.c) to the OS[3], when it is possible to do so. But
it is not always possible to do so.

Ruby does not free heaps. It it supposed to do so, but the way it
allocates very big heaps forbids him to do that in practice.

That’s true; Ruby tends to use relatively large heap sizes, which
makes it unlikely for there to be any unused heaps which can be freed.

Using per-heap freelists and preferring newer heaps for allocation
would help, although it would mean additional overhead to determine
which heap a freed object belonged to.

-mental

Ruby does make a reasonable effort to return unused blocks (“heaps” in
the parlance of gc.c) to the OS[3], when it is possible to do so. But
it is not always possible to do so.
Ruby does not free heaps. It it supposed to do so, but the way it
allocates
very big heaps forbids him to do that in practice. Search the ‘gc.c –
possible logic error?’ thread on ruby-core for details.

Nonetheless, you have a lot of objects in less than 10M of heaps (and I
mean a lot), so the 150M memory usage is certainly not due to that.

preferring newer heaps for allocation would help
Actually, it does not really. The probability to have at least one
object
kept in, say, 10000 (mimumn heap size) is quite big. I’ll have more raw
data to show when I have time to do a page about them.

Sylvain

Yohanes S. wrote:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/160726
and
http://www.crowdedweb.com/articles/2006/03/22/ruby-on-rails-and-application-memory-consumption-patterns.

The second link is dead, but there may be cached copies around.

Ah, good ol’ archive.org
Ruby on Rails and Application Memory Consumption Patterns

Thank you everyone. I can now make some relatively informed and ration
explanations to my cohorts here.

Have a great day!

2007/10/12, Blackie [email protected]:

Thank you everyone. I can now make some relatively informed and ration
explanations to my cohorts here.

Let us know the outcome. :slight_smile:

robert

“ara.t.howard” [email protected] writes:

On Oct 11, 2007, at 6:40 AM, Yohanes S. wrote:

But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

YS.

sorry, for jumping in, but i’d love your opinion on this yohanes:

http://drawohara.tumblr.com/post/14421265

Ah, I’m sorry. I have been swamped recently and has not been able to
read this mailbox.

Too bad you’re on Darwin. Had you been on Linux (and on glibc), I’d
suggest you to patch ruby so that it executes this first thing after
it starts:

mallopt(M_MMAP_THRESHOLD, 0); /* declared in malloc.h */

What this does is to make all allocation using mmap instead of
sbrk. This allows all free() to return the allocated space back to the
kernel. Doing this eliminates the possiblity that VSZ climbs because
of memory fragmentation. If VSZ still climbs, then there are some
garbage somewhere not released. OTOH, this causes syscall for every
allocation.

I hope there is a suitable equivalent in Darwin.

YS.

Joel VanderWerf [email protected] writes:

(2) an estimate of how much memory the kernel has allocated to the process.

It’s #1.

For #2, it’s quite difficult what with shareable memory and such. VSZ
is a good approximation for that if there isn’t that much shareable
memory.

YS.

“M. Edward (Ed) Borasky” [email protected] writes:

Is this worth making part of the standard Ruby build on Linux?

Definitely not. This is just for diagnostic. It’s also quite expensive
because you incur syscall for each memory allocation&deallocation.

If VSZ climbs up because of memory fragmentation, then it is really a
perception problem. No unnecessary real memory will be wasted because
of the process memory management the kernel does. The only real
annoyance is when the process terminates, the kernel will be swapping
in all those pages. This is an annoyance that could borderline to
become a problem depending on your requirement. Some OSes, like
FreeBSD, has a special option that disable the swapping in of all
those pages when a process terminates[1].

If VSZ climbs up because garbages are not being collected, then you
can try to fix the problem. Most probably they are fixable from user
code (having references to unnecessary objects). If the problem is
because of ruby’s GC quirknesses (few but could be very difficult to
fix), then it’s a toss-up. But I still say the cost is probably not
worth it.

I don’t favour the long-running process model for server. I prefer to
fork() for each request. So I’m rarely bothered by whatever ruby’s GC
quirknesses that I may have triggered. I understand that this approach
is not trendy anymore and RoR does not support this model, but I’m
just throwing it out in the open for an alternative work-around where
possible.

YS.

Footnotes:
[1] Thanks to Daniel DeLorme for the link:
http://web.archive.org/web/20061023143846/www.crowdedweb.com/articles/2006/03/22/ruby-on-rails-and-application-memory-consumption-patterns

On Oct 18, 2007, at 9:32 AM, Yohanes S. wrote:

I don’t favour the long-running process model for server. I prefer to
fork() for each request. So I’m rarely bothered by whatever ruby’s GC
quirknesses that I may have triggered. I understand that this approach
is not trendy anymore and RoR does not support this model, but I’m
just throwing it out in the open for an alternative work-around where
possible.

that’s quite interesting because, while i’m not the memory expert you
are, i’ve settled on exactly that model for the many many server
process i’ve written for 24x7 systems: the robustness simply cannot
be beaten.

kind regards.

a @ http://codeforpeople.com/

Yohanes S. wrote:

http://drawohara.tumblr.com/post/14421265
What this does is to make all allocation using mmap instead of
sbrk. This allows all free() to return the allocated space back to the
kernel. Doing this eliminates the possiblity that VSZ climbs because
of memory fragmentation. If VSZ still climbs, then there are some
garbage somewhere not released. OTOH, this causes syscall for every
allocation.

I hope there is a suitable equivalent in Darwin.

YS.

Is this worth making part of the standard Ruby build on Linux?

On Thu, 18 Oct 2007 23:31:13 +0900, “M. Edward (Ed) Borasky”
[email protected] wrote:

Is this worth making part of the standard Ruby build on Linux?

The downside is that using mmap for every allocation can result in
a large number of distinct memory mappings, hurting performance.

Ideally, Ruby could allocate heaps using mmap rather than malloc,
on platforms where it was feasible to do so (generally this means
private mappings of /dev/zero, which not all Unices support). On
Windows you’d probably want to use VirtualAlloc().

-mental

Yohanes S. wrote:

mallopt(M_MMAP_THRESHOLD, 0); /* declared in malloc.h */

Very interesting. I’d like to use this as a diagnostic.

I patched ruby’s main.c to call mallopt() before anything else. It seems
to be using a huge amount of memory, though. Is this normal? Starting
the following:

$ ruby -e ‘sleep 100’

Causes:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21422 vjoel 17 0 142m 140m 1892 S 0.0 13.9 0:00.62 ruby

I have 1.8.6p110 and ubuntu 7.04:

$ ruby -v
ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-linux]
$ uname -a
Linux tumbleweed 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007
i686 GNU/Linux