Thread#raise, Thread#kill, and timeout.rb are unsafe

headius · March 22, 2008, 6:56am

ara howard wrote:

no doubt you’ve read it, but for posterity i’ll post it here:

http://209.85.207.104/search?q=cache:2T61vSNQ4_EJ:home.pacbell.net/ouster/threads.pdf+"why+threads+are+a+bad+idea"&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a

I don’t think we’ll get anywhere solving the problem by saying “let’s
just remove threads” or “don’t use threads”.

Threads are hard, yes. Threads are too hard for most programmers, yes.
But since they’re in Ruby, they’re going to get used…and providing
provably unsafe operations on threads is obviously not making matters
better.

Besides, even if you avoid using threads, the libraries you call might
not. So we need a global answer.

Charlie

headius · March 22, 2008, 7:06am

MenTaLguY wrote:

– something which would have worked fine with normal use of locks, but
becomes a problem once threads can “stop the world”).

And to clarify, this problem isn’t restricted to native threads…even a
green-threaded model could deadlock if one thread goes critical while
other threads are holding unrelated locks.

Charlie

headius · March 22, 2008, 8:04am

On Mar 22, 2008, at 12:04 AM, Charles Oliver N. wrote:

Presumably by “critical section” you mean “locking on as narrow a
lock as possible so as not to stop the whole freaking world because
one thread wants to send an event to another”. critical= is the devil

yes - lock only long enough to push the exception onto a stack/queue
where it can be checked at an opportune moment - as opposed to async
exceptions just bubbling in whenever.

a @ http://drawohara.com/

headius · March 24, 2008, 12:15pm

In article 3bc15bbea3f2a64de52c462c8e32d6c3@localhost,
MenTaLguY [email protected] writes:

This sounds very good! I hadn’t considered anything like
blocking_interruptible, but it seems useful.

The idea is come from the cancelation point of pthread.

headius · March 22, 2008, 8:06am

On Mar 21, 2008, at 11:55 PM, Charles Oliver N. wrote:

I don’t think we’ll get anywhere solving the problem by saying
“let’s just remove threads” or “don’t use threads”.

indeed. the only sane approach, i think, is the middle one whereby
threads are there, exceptions are there, but cross thread exceptions
operated via channels and events.

a @ http://codeforpeople.com/

headius · March 24, 2008, 1:23pm

In article [email protected],
Paul B. [email protected] writes:

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

It is a very difficult problem.

I don’t try to solve the problem perfectly.

I think the data lost may be acceptable if an asynchronous
event is for termination.

If the purpose is termination, data lost may be happen
anyway. For example, data from TCP is lost if a process
exits before data arrival.

If data lost is acceptable but termination delay is not
acceptable, Thread.blocking_interruptible(klass) { I/O } can
be used.

If data lost is not acceptable but termination delay is
acceptable, Thread.blocking_uninterruptible(klass) { I/O }
can be used.

If data lost and termination delay is not acceptable, it is
difficult. One idea is
Thread.blocking_interruptible(klass) { IO.select } and
nonblocking I/O. But nonblocking I/O causes inherently
partial result. So there should be a read/write buffer. If
termination procedure ignore the read buffer, data lost
occur. If termination procedure flushes the write buffer,
it may blocks. So data lost or termination delay again. It
is the difficult problem.

I hope either data lost or termination delay is acceptable.

If an asynchronous event is used for non-termination, data
lost is not acceptable in general. Assume a procedure is
called for the event. If some delay is acceptable,
Thread.blocking_uninterruptible(klass) { I/O } can be used.
If the delay is also not acceptable, it is the difficult
problem. However if the event is caused by Thread#raise,
I’m not sure why the procedure is not called directly
instead of Thread#raise. If the event is caused by signal,
I have no good idea to do it.

So I think asynchronous events should be used only for
termination. This is why I think exception on blocking
operation is tolerable.

headius · March 24, 2008, 2:18pm

I think the specific problem mentionned was that asynchronous exceptions
can abort the cleanup within an ensure section. So what about a simple
solution to that specific problem, like: if an asynchronous exception is
raised inside an ensure section, the exception is queued and raised at
the end of the ensure section. Is that too naive?

headius · March 24, 2008, 1:42pm

In article [email protected],
Charles Oliver N. [email protected] writes:

As mental has said, if threads are uninterruptible by default, this
would make ensures safe. I think that’s the only reasonable option.

If threads are uniterruptible by default for all kind of
asynchronous events, SIGINT and other signals are not
effective by default. It is not reasonable.

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.

headius · March 24, 2008, 4:52pm

On Mon, Mar 24, 2008 at 09:41:29PM +0900, Tanaka A. wrote:

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.

I think this is reasonable too, especially given that we can turn
signals off using trap.

Paul

headius · March 24, 2008, 4:44pm

On Thu, Mar 20, 2008 at 07:21:06AM +0900, MenTaLguY wrote:

Now, the question with IO and asynchronous exceptions in Ruby is
whether there is a good way to report a partial result?

The existing API already returns the number of bytes successfully
written for both IO#write and IO#syswrite. But how do we find out what
caused the operation to be interrupted? Currently IO#write gives an
exception:

irb(main):001:0> r, w = IO.pipe
=> [#IO:0x4039f734, #IO:0x4039f70c]
irb(main):002:0> t = Thread.new { w.write(“HELLO” * 20000) }
=> #<Thread:0x4039a0a4 run>
irb(main):003:0> r.close
=> nil
irb(main):004:0> t.join
Errno::EPIPE: Broken pipe
from (irb):2:in write' from (irb):4:injoin’
from (irb):4

and IO#syswrite gives the number of bytes written:

irb(main):018:0> sock = TCPSocket.new(‘localhost’, 2000)
=> #TCPSocket:0x4043bf1c
irb(main):019:0> t = Thread.new { p sock.syswrite(“HELLO” * 200000) }
=> #<Thread:0x40428804 run>
irb(main):024:0> t.join
131072
=> #<Thread:0x4040d324 dead>

I think the behavior of IO#syswrite is correct (because there is only
one write(2) system call), but IO#write actually got both partial
success and an exceptional condition, so what should its behavior be?

Paul

headius · March 24, 2008, 5:08pm

On Mon, 24 Mar 2008 22:17:34 +0900, Daniel DeLorme [email protected]
wrote:

I think the specific problem mentionned was that asynchronous exceptions
can abort the cleanup within an ensure section. So what about a simple
solution to that specific problem, like: if an asynchronous exception is
raised inside an ensure section, the exception is queued and raised at
the end of the ensure section. Is that too naive?

Maybe slightly. I don’t think it would necessarily be an issue for the
current implementation of 1.9, but you do also need to do the
registration/unregistration of ensure and catch clauses in an
uninterruptible fashion.

Basically you need to have the equivalent of this:

Thread.uninterruptable do
begin
Thread.interruptable do
# …
end
rescue
# …
ensure
# …
end
end

(in cases where the thread is currently interruptable)

See also section 4.2 of “Asynchronous Exceptions in Haskell”:

http://citeseer.ist.psu.edu/415348.html

-mental

headius · March 24, 2008, 5:11pm

On Mon, 24 Mar 2008 21:41:29 +0900, Tanaka A. [email protected] wrote:

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.

This seems sensible to me, though it would be nice to also have
an alternate way to handle POSIX signals that didn’t involve
interruption.

-mental

headius · March 25, 2008, 11:14am

In article b52213a47751467b7fcc5c5e245fdba4@localhost,
MenTaLguY [email protected] writes:

Now, the question with IO and asynchronous exceptions in Ruby is
whether there is a good way to report a partial result?

I think defining the partial results causes, ultimately,
event driven architecture.

The partial result is similar to a continuation.

For syswrite, it is number of bytes wrote. It can be used
to resume syswrite.

If IO#gets returns a partial result on an interrupt, it
should be usable to resume IO#gets. It may contain the
bytes read. If the IO converts character code, it may
contain characters converted, a state of character code
conversion engine and the bytes not converted yet.

If http request reading procedure returns partial result on
an interrupt, it should be usable to resume the procedure.
It contains a state of http request parser. If the parser
is recursive decent, it contains the stack. It is the
continuation.

So defining the partial result of a complex I/O operation
such as http request reading is translating continuation to
some data structure. It is done by an event driven
architecture.

If a program is not event driven, I think that defining the
partial results for complex I/O operations is too tedious.

headius · March 24, 2008, 5:26pm

On Mon, 24 Mar 2008 21:22:35 +0900, Tanaka A. [email protected] wrote:

So I think asynchronous events should be used only for
termination. This is why I think exception on blocking
operation is tolerable.

I think I agree. However, that still leaves us with timeouts, which
are sometimes done for operations that will be retried in the future,
so data loss is not acceptable there.

Won’t we also need to augment the existing stdlib APIs to allow the
specification of timeouts for any blocking operations that can’t
already be done with non-blocking and select?

-mental

headius · March 25, 2008, 11:19am

In article 7f2500ea06b9d4ae2a8ac86ebf513d10@localhost,
MenTaLguY [email protected] writes:

This seems sensible to me, though it would be nice to also have
an alternate way to handle POSIX signals that didn’t involve
interruption.

Thread.delay_interrupt(klass) { … } is intended for that.

Hm. It may need to take “SIGTERM”, etc. because Ruby has no
exception class corresponding to single signal, SIGTERM,
etc.

headius · March 25, 2008, 11:34am

In article 5ec9772607938ba6d0665ce1905904eb@localhost,
MenTaLguY [email protected] writes:

I think I agree. However, that still leaves us with timeouts, which
are sometimes done for operations that will be retried in the future,
so data loss is not acceptable there.

For net/http, data loss is not big problem because
timeouted request is not resumed. The timeout terminates
the http request.

I’m not sure your assumption. If an operation will be
resumed later, why the operation needs timeout? What a task
should be done between the timeout and the resuming? Why
the task should not be done in another thread?

Won’t we also need to augment the existing stdlib APIs to allow the
specification of timeouts for any blocking operations that can’t
already be done with non-blocking and select?

It is reasonable, I think.

headius · March 26, 2008, 3:08am

On Tue, 2008-03-25 at 19:33 +0900, Tanaka A. wrote:

I’m not sure your assumption. If an operation will be
resumed later, why the operation needs timeout? What a task
should be done between the timeout and the resuming? Why
the task should not be done in another thread?

That’s a good point; if another thread is used most of the cases I was
thinking of are unnecessary. So I think I’m satisfied.

-mental

headius · March 25, 2008, 3:32pm

Basically you need to have the equivalent of this:

Thread.uninterruptable do
begin
Thread.interruptable do
# …
end
rescue
# …
ensure
# …
end
end

Yeah that would be work. That or, as one poster put it, only use
exceptions to kill, not to raise (to snuff out the non-determinism and
force users to use queues). Both of those would work.
Good luck.
-R