Deadlock in DRb

larsch · April 22, 2008, 1:20pm

In a program with two DRb servers running (two time start_service), i
get the following deadlock after a while of running with a client
connecting to both servers:

deadlock 0x284c748: sleep:J(0x2c84f7c) (main) - server.rb:54
deadlock 0x2c84f7c: sleep:F(4) - c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:
944
deadlock 0x2d01338: sleep:F(5) - c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:
566
deadlock 0x2c854cc: sleep:F(3) - c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:
944
deadlock 0x2cff81c: sleep:S - c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:
626
c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:626: Thread(0x2cff81c): deadlock
(fatal)

How can I debug this issue? I don’t understand why it is a deadlock at
all, since drb.rb:944 is a call to Socket#accept, which does not
depend purely on other Ruby threads.

Any ideas?

Lars

larsch · April 22, 2008, 3:47pm

2008/4/22, Lars C. [email protected]:

In a program with two DRb servers running (two time start_service), i

Why do you have two servers?

deadlock 0x2cff81c: sleep:S - c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:
626
c:/lang/ruby/lib/ruby/1.8/drb/drb.rb:626: Thread(0x2cff81c): deadlock
(fatal)

How can I debug this issue? I don’t understand why it is a deadlock at
all, since drb.rb:944 is a call to Socket#accept, which does not
depend purely on other Ruby threads.

Any ideas?

For a deadlock you need at least two resources that are locked in
different order. Maybe you have synchronized calls across the two
servers that deadlock.

You could use set_trace_func to trace program execution until the
deadlock and look at the execution flow.

Kind regards

robert

larsch · April 23, 2008, 3:11pm

2008/4/23, Lars C. [email protected]:

On Apr 22, 3:45 pm, Robert K. [email protected] wrote:

2008/4/22, Lars C. [email protected]:

In a program with two DRb servers running (two time start_service), i

Why do you have two servers?

Well… legacy. I have converted my application to having only 1 DRb
service started, but the same problem occurs. I still get a deadlock
after the clients have been connecting for a while.

Too bad.

For adeadlockyou need at least two resources that are locked in
different order. Maybe you have synchronized calls across the two
servers thatdeadlock.

My main thread is blocked by DRb.thread.join. All other threads are
inside the DRb library on either Socket#accept, #read or #write.

And, are there any locks held?

How can there be a deadlock if a thread is waiting in a Socket#accept
call? As I understand the Ruby deadlock detection is simply fires when
there is no thread to run.

You could use set_trace_func to trace program execution until thedeadlockand look at the execution flow.

I have tried this, but it doesn’t show anything other that the
deadlock report from Ruby, i.e. that the threads are calling
Socket#accept, #read or #write and Thread#join.

These issues are next to impossible to debug without access to code
and an understanding of what the app really does. I’m afraid, I can’t
help you further right now.

Kind regards

robert

larsch · April 23, 2008, 8:20pm

On Apr 23, 2008, at 6:11 AM, Robert K. wrote:

service started, but the same problem occurs. I still get a deadlock

I have tried this, but it doesn’t show anything other that the
deadlock report from Ruby, i.e. that the threads are calling
Socket#accept, #read or #write and Thread#join.

These issues are next to impossible to debug without access to code
and an understanding of what the app really does. I’m afraid, I can’t
help you further right now.

Kind regards

robert

What version and patch level of ruby do you have? If you have ruby
1.8.6 and the patch level is less than p111 then you have a faulty
ruby interpreter with broken threading that can cause these deadlocks.
Make sure you are using ruby 1.8.5 or ruby 1.8.6p11 minimum.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

larsch · April 23, 2008, 2:50pm

On Apr 22, 3:45 pm, Robert K. [email protected] wrote:

2008/4/22, Lars C. [email protected]:

In a program with two DRb servers running (two time start_service), i

Why do you have two servers?

Well… legacy. I have converted my application to having only 1 DRb
service started, but the same problem occurs. I still get a deadlock
after the clients have been connecting for a while.

For adeadlockyou need at least two resources that are locked in
different order. Maybe you have synchronized calls across the two
servers thatdeadlock.

My main thread is blocked by DRb.thread.join. All other threads are
inside the DRb library on either Socket#accept, #read or #write.

How can there be a deadlock if a thread is waiting in a Socket#accept
call? As I understand the Ruby deadlock detection is simply fires when
there is no thread to run.

You could use set_trace_func to trace program execution until thedeadlockand look at the execution flow.

I have tried this, but it doesn’t show anything other that the
deadlock report from Ruby, i.e. that the threads are calling
Socket#accept, #read or #write and Thread#join.

Lars

larsch · April 24, 2008, 2:15pm

On Apr 23, 8:19 pm, Ezra Z. [email protected] wrote:

    What version and patch level of ruby do you have? If you have ruby  
1.8.6 and the patch level is less than p111 then you have a faulty
ruby interpreter with broken threading that can cause these deadlocks.
Make sure you are using ruby 1.8.5 or ruby 1.8.6p11 minimum.

Had the same problem with 1.8.6p111. I finally tracked down the
problem to a bug in Process.create from the ‘win32-process’ gem. Some
code added to this function afterversion 0.5.5 would call CloseHandle
on something that was not a handle but a process or thread ID. When
these are the same as socket handles, etc, the process would sometimes
deadlock, sometimes simply close a listening socket, fail in
Socket#accept, or go into infinte loops.

http://rubyforge.org/tracker/index.php?func=detail&aid=19753&group_id=85&atid=411.

I was able to work around it by setting :close_handles => false in the
call to Process#create.

Lars