Thread#raise, Thread#kill, and timeout.rb are unsafe

On Mar 18, 2008, at 7:44 AM, Gary W. wrote:

Here is a related talk/paper that offers a slightly different
perspective than Ousterhout: http://swtch.com/~rsc/talks/threads07/

with 42 slides - it must be good!

:wink:

a @ http://codeforpeople.com/

On Mar 18, 2008, at 7:44 AM, Gary W. wrote:

Here is a related talk/paper that offers a slightly different
perspective than Ousterhout: http://swtch.com/~rsc/talks/threads07/

with 42 slides - it must be good!

:wink:

a @ http://codeforpeople.com/

ara howard wrote:

…

even_loop do
stuff
if exception = queue.pop
…
end
end

seems like it would address a ton of issued without overly complicating
ruby’s internals - just provide an ‘on raise’ handler.

Is it possible to integrate this with IO#select, so a thread can handle
an exception sent to it while waiting on io?

On Wed, 19 Mar 2008 00:20:54 +0900, ara howard [email protected]
wrote:

stuff
if exception = queue.pop
…
end
end

seems like it would address a ton of issued without overly
complicating ruby’s internals - just provide an ‘on raise’ handler.

How do you plan on protecting against non-determinism introduced
by the “on raise” handler? For example, if queue were a
user-implemented data structure, and the handler fired in the middle
of queue.pop? If queue.pop is not protected by a lock, there is
potential to corrupt the queue object. If it is protected by a lock,
there is potential for deadlock.

(This is basically the same issue as asynchronous signal safety.)

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

-mental

On Tue, 18 Mar 2008 22:44:42 +0900, Gary W. [email protected] wrote:

Here is a related talk/paper that offers a slightly different
perspective than Ousterhout: http://swtch.com/~rsc/talks/threads07/

+1; those slides are quite good.

-mental

On Tue, 18 Mar 2008 14:04:40 +0900, Tanaka A. [email protected] wrote:

Thread.check_interrupt is a safe point.

Blocking operations in Thread.blocking_interruptible are
safe points.

This sounds very good! I hadn’t considered anything like
blocking_interruptible, but it seems useful.

-mental

Thread.current.raises do |exception|
queue.push exception
end

…

event_loop do
long_running_stuff
if exception = queue.pop
…
end
end

seems like it would address a ton of issued without overly
complicating ruby’s internals - just provide an ‘on raise’ handler.

This looks nice!

I wish that a handler like this could be “unhandled,” as well, as you
have the flexibility to handle exceptions only in parts of the code.
Say we call your first block of code as queue_future_exceptions

Then you could have code like

Thread.current.queue_future_exceptions

code in here is uninterruptible

Thread.current.no_longer_queue_future_exceptions # so I don’t have to
worry about them EVERYWHERE in my code–only at the end of certain
blocks.
Thread.current.raises do |exception| ; end

So basically it would allow for uninterruptible blocks. That’s what I
wish we had.

In response to Mental’s questions of concurrency problems for raises
injected whilst you are in the middle of handling raises, some options
that might help:

  1. protect the queue with mutexes–add them to the end of the queue
    atomically.
  2. dare I say it, require the exception handling to be either within or
    without of an “uninterruptible block” --that is, allow the exception
    handling to be interrupted (itself), or prevent it from being
    interrupted, always, to provide for sanity.

The first option (require handling outside of an uninterruptible block)
would allow for us to “not have to worry” about concurrency of adding to
the exception queue, since we no longer add to the queue but just raise
on the thread. It would allow it to be interrupted, which might be
annoying, however, and isn’t our purpose to “avoid” those pesky
asynchronous interrupts?

The second option (requiring handling within an uninterruptible block)
avoids handling’s being interrupted, however, it could “miss” an
interrupt or two, should they be raised between when we handle them and
when an “uninterruptible block” ends.

So I guess the best way to deal with it would be to surround queue
addition with mutexes, and handle them both right before an
uninterruptible block ends, and again right after it ends.

Ok so not such a great idea.
Thoughts?

On Wed, 19 Mar 2008 04:13:19 +0900, ara howard [email protected]
wrote:

On Mar 18, 2008, at 11:52 AM, MenTaLguY wrote:

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

perhaps i wasn’t being clear. that’s exactly what i was showing - a
handler that gets called (critical section of course) which simply
enques the exception. then the thread itself must check for these at
the ‘right’ moment.

Okay. I’d thought you were suggesting an API rather than illustrating
an idea.

-mental

On Mar 18, 2008, at 11:52 AM, MenTaLguY wrote:

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

perhaps i wasn’t being clear. that’s exactly what i was showing - a
handler that gets called (critical section of course) which simply
enques the exception. then the thread itself must check for these at
the ‘right’ moment.

a @ http://codeforpeople.com/

On Wed, 19 Mar 2008 03:16:05 +0900, Roger P. [email protected]
wrote:

In response to Mental’s questions of concurrency problems for raises
injected whilst you are in the middle of handling raises, some options
that might help:

  1. protect the queue with mutexes–add them to the end of the queue
    atomically.

This would deadlock if the handler happened to fire while the queue’s
lock was held. Generally, locking doesn’t mix with anything that can
interrupt or suspend the execution of a thread at an arbitrary time.

  1. dare I say it, require the exception handling to be either within or
    without of an “uninterruptible block” --that is, allow the exception
    handling to be interrupted (itself), or prevent it from being
    interrupted, always, to provide for sanity.

That’s a good point – besides conflicts between the handler and
whatever
the recipient thread is running at the time, you do also need to be
concerned with conflicts between multiple handler instances (if that
is allowed).

So I guess the best way to deal with it would be to surround queue
addition with mutexes, and handle them both right before an
uninterruptible block ends, and again right after it ends.

Ok so not such a great idea.

In terms of an API for signal queuing, I think Tanaka A.'s idea
is elegant and reliable.

-mental

Hi,

In message “Re: Thread#raise, Thread#kill, and timeout.rb are unsafe”
on Tue, 18 Mar 2008 23:02:24 +0900, Tanaka A. [email protected]
writes:

|> This did have a few weird side-effects though. For example, running a
|> complex regex would shutoff signal handling for the duration. That
|> meant you could no longer timeout a regex.
|
|Ruby already defer signal handling same as Perl.

1.8 regexp does check interrupts during match, 1.9 doesn’t check (yet).

          matz.

On Tue, Mar 18, 2008 at 02:04:40PM +0900, Tanaka A. wrote:

It is because Thread.check_interrupt with blocking operations causes
race conditions.

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

The same is true for any operation for which partial success is
possible.

So application must choose that make a blocking operation interruption
safe or uninterruptible.

  • Thread.blocking_interruptible(klass) { … }
  • Thread.blocking_uninterruptible(klass) { … }

Interesting. Which of these two is the default?

Another idea is about ensure clause. Since an ensure clause is used
to release a resource, it may delay asynchronous events as in
Thread.delay_interrupt.

Reminds me of an old C++ interview question we used to joke about:

“While digging through source code, you discover an application which
executes entirely in the body of the destructor of a global object.
Give one or more reasons why the author may have chosen to implement the
application this way.”

Paul

On 18/03/2008, ara howard [email protected] wrote:

no doubt you’ve read it, but for posterity i’ll post it here:

http://209.85.207.104/search?q=cache:2T61vSNQ4_EJ:home.pacbell.net/ouster/threads.pdf+"why+threads+are+a+bad+idea"&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a

Whatever it’s supposed to be the link does not work for me.

Thanks

Michal

On Thu, 20 Mar 2008 02:29:09 +0900, Paul B. [email protected]
wrote:

The same is true for any operation for which partial success is
possible.

That’s a good point. To some extent it is an API design and
implementation problem; an API which can partially “succeed” in that
fashion is arguably flawed.

There are really two alternatives:

  1. fix the operation so that it succeeds or fails atomically

  2. otherwise, make the operation always non-interruptable

(Well, there is also a third solution, used by Java’s NIO for blocking
reads and writes: on interrupt, destroy the object in question so that
it doesn’t matter anymore. But I don’t think it’s a particularly good
solution.)

-mental

On Thu, 20 Mar 2008 05:57:02 +0900, Paul B. [email protected]
wrote:

respect to signals.
Yes. For my purposes, partial success is still success as long
as the caller can tell how far it got. Even in the absence of
signals, a successful write() isn’t guaranteed to write everything
that was requested. The important thing is that the operation
doesn’t report total failure when it really succeeded, or vice-versa.

Historical example: under SVR4, a write() interrupted by a signal
would always fail with EINTR, even if some bytes had already
been written. POSIX later remedied this, so that write() would
always report success if any bytes were written, even if it was
interrupted by a signal. In that instance, the API was modified
to conform to alternative #1.

Now, the question with IO and asynchronous exceptions in Ruby is
whether there is a good way to report a partial result?

-mental

On Thu, Mar 20, 2008 at 03:50:13AM +0900, MenTaLguY wrote:

There are really two alternatives:

  1. fix the operation so that it succeeds or fails atomically

  2. otherwise, make the operation always non-interruptable

For I/O operations, it is typically desirable for the operation to be
interruptible, but to get the result of the partial operation after it
is interrupted. At least, that’s how it works at the C level with
respect to signals.

Paul

On Thu, 20 Mar 2008 07:21:06 +0900, MenTaLguY [email protected] wrote:

Yes. For my purposes, partial success is still success as long
as the caller can tell how far it got.

(For the sake of those following along who might not have a lot of
C or Unix experience, I forgot to mention that a successful write()
always returns the number of bytes actually written. The issue with
SVR4 Unix was that interrupted writes were always treated like
failures, so no count was returned in those cases and you couldn’t
tell how much you had left over in case you needed to try again.)

-mental

I still like Ara’s idea :slight_smile:

However–is something like this already possible if we redefine the
Thread#raise method? For example define it to be something like this.

class Thread
def raise_safe *args
# sorry, couldn’t think of a better way than a mutex to prohibit
weirdness from occurring. So that’s where a problem would occur–in the
ensure block of the synchronize function. Ahh well. If you re-aliased
raise to be this method, maybe avoided.

@raise_mutex.synchronize {
unless @currently_non_interruptible_section
raise *args
else
self.queued_interrupts << *args
end
}
end

def uninterruptible_block
return if @currently_non_interruptible_section # already within one.
Maybe should keep a count of how deep you are. Not sure
@currently_non_interruptible_section = true
begin
yield # might want to handle queued exceptions at the end of this
block
ensure
@raise_mutex.synchronize {@currently_non_interruptible_section =
false}
end
# after this point the thread is subject to interruptions again
end

end

Then you could have a thread do something like

uninterruptible_block do

some stuff

handle the queued interrupts if desired. Note that you could loop

and ‘check if you’ve been interupted yet’ once per loop or what not, if
desired

end

handle queued interrupts, if desired

timeout would possibly become

def timeout(sec, exception=Error)
return yield if sec == nil or sec.zero?
raise ThreadError, “timeout within critical session”
if Thread.critical
raise ThreadError, “called nested functions that want to use
uninterruptible and raise on it” if
Thread.current.currently_non_interruptible_section

this is necessary since we do use ‘real’ raise this time

to interrupt our block’s yielding

so if you had nested–ever–the raise of an uppermost one

could interrupt a lower one in it ensure. Same problem would occur.

we can only have one at a time.

this is a hack that assumes we ‘only’ use uninterruptible_block for

calls to timeout.

death_mutex = Mutex.new

uninterruptible_block do
begin
x = Thread.current
y = Thread.start {
sleep sec
death_mutex.synchronize {
x.raise exception, “execution expired” if x.alive? # this is
not raise_safe, but real raise. Still unsafe as per some previous
comments…but is it safer than it used to be now?
}
}
yield sec # would this code be dampered somehow?
ensure
death_mutex.synchronize {
y.kill if y and y.alive?
}
end
end
for interrupt in Thread.current.queued_interrupts do; raise interrupt;
end # or something like that
end

now you could “replace” the use of raise with raise_safe. Would this
work?
This would prohibit ‘other threads’ from interrupting a thread that is
within a timeout block (obviously bad) until it timed out, but would it
be thread safe?

Thoughts?
Thank you for reading :slight_smile:

Overall I’d say I’m in agreeance with the OP’s call to deprecate
Thread#kill, Thread#raise, Thread#critical= as they can too easily hurt,
though something like this might help. This suggestion also seems far
less complete than Tanaka’s somehow.

Still wishing for an ensure block that was somehow ‘guaranteed’ to run
all the way through, and delay interruptions or ignore them till it
ended.
Thanks.

-R

Tanaka A. wrote:

In article [email protected],
Yukihiro M. [email protected] writes:

What should safe point declaration mechanism look like?

Basically, asynchronous events should be queued.

  • make a queue for each thread.
  • Thread#raise(exc) enqueue exc to the queue.
  • Thread.check_interrupt(klass) checks an exception which is
    a kind of klass in the queue of the current thread. If it
    exists, the exception is raised.
  • other queue examining mechanism, such as sigpending, etc,
    may be useful.

Thread.check_interrupt is a safe point.

However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

  • Thread.blocking_interruptible(klass) { … }
  • Thread.blocking_uninterruptible(klass) { … }

Blocking operations in Thread.blocking_interruptible are
safe points.

If a thread has no resources to release, it is safe to kill
asynchronously. Other than that, safe points should be
declared explicitly. When a resource is acquired,
application should declare it.

  • Thread.delay_interrupt(klass) { … }

It is safe points that out of outermost
Thread.delay_interrupt.

Another idea is about ensure clause. Since an ensure
clause is used to release a resource, it may delay
asynchronous events as in Thread.delay_interrupt.

Tanaka A. wrote:

Thread.check_interrupt is a safe point.
Agreed. This is essentially how these unsafe operations are implemented
in JRuby currently. The “raise queue” is one-element long, and
appropriate locks are acquired to modify it from outside the target
thread. All threads periodically check to see if they’ve been “raised”
or “killed” or whether they should sleep because another thread has
“gone critical”. The checkpoints are currently at the same points during
execution that Ruby 1.8 checks whether it should run the thread
scheduler.

However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

  • Thread.blocking_interruptible(klass) { … }
  • Thread.blocking_uninterruptible(klass) { … }

Also good. Awful names though :slight_smile:

It is safe points that out of outermost
Thread.delay_interrupt.

Another idea is about ensure clause. Since an ensure
clause is used to release a resource, it may delay
asynchronous events as in Thread.delay_interrupt.

As mental has said, if threads are uninterruptible by default, this
would make ensures safe. I think that’s the only reasonable option.

  • Charlie

ara howard wrote:

handler that gets called (critical section of course) which simply
enques the exception. then the thread itself must check for these at
the ‘right’ moment.

Presumably by “critical section” you mean “locking on as narrow a lock
as possible so as not to stop the whole freaking world because one
thread wants to send an event to another”. critical= is the devil.

  • Charlie