Nonblocking IO read

robertjames · November 7, 2006, 10:52am

Il giorno 01/nov/06, alle ore 01:33, removed_email_address@domain.invalid ha scritto:

entire process
when waiting on IO so not an option here.

I thought Ruby internally uses non-blocking I/O in order to avoid
that a green thread reading something blocks every other thread: am I
wrong?
Or is this true just under unix?

robertjames · November 7, 2006, 10:52am

From: removed_email_address@domain.invalid

(I’m supporting 'nix and Windows)

Windows will be a problem. Admittedly, I haven’t tried
ruby 1.8.5 yet, which has new nonblock_* methods. However,
my expectation is that you’ll only get nonblocking behavior
on windows from sockets, not from pipes.

On Windows, calling select() on a pipe, always returns
immediately with “data ready to read”, regardless if there’s
any data there or not.

This has been the bane of my existence on Windows ruby for
5 or 6 years. I do IPC on Windows ruby using TCP over
loopback, instead of pipes, in order to get nonblocking
semantics. (That still doesn’t help for reading from the
console, though… (search the archives for ‘kbhit’ for a
partial solution there…))

One of these years, I’d like to chat with a Windows guru
and ask how he/she would recommend making a select() that
works on both sockets and pipes on Windows. Ruby could
really use one.

Regards,

Bill

robertjames · November 7, 2006, 10:52am

On 11/1/06, Bill K. removed_email_address@domain.invalid wrote:

This has been the bane of my existence on Windows ruby for

Regards,

Bill

I think part of the problem here is that on Windows, functions like
_pipe()
were added primarily as hacks to make it easier to migrate programs from
Unix, and never completely implemented. It may be too much to ask for
them
to play nice in the sandbox with select().

robertjames · November 7, 2006, 10:52am

On Thu, 2 Nov 2006, Bill K. wrote:

console, though… (search the archives for ‘kbhit’ for a partial solution
there…))

One of these years, I’d like to chat with a Windows guru and ask how he/she
would recommend making a select() that works on both sockets and pipes on
Windows. Ruby could really use one.

maybe

Apache Portable Runtime: Poll Routines

-a

robertjames · November 7, 2006, 10:52am

On Thu, 2 Nov 2006, Gabriele M. wrote:

I thought Ruby internally uses non-blocking I/O in order to avoid that a
green thread reading something blocks every other thread: am I wrong? Or is
this true just under unix?

try this on windows

harp:~ > cat a.rb
t = Thread.new{ loop{ STDERR.puts Time.now.to_f } }
STDIN.gets

-a

robertjames · November 7, 2006, 10:53am

On Nov 1, 2006, at 11:30 AM, Robert K. wrote:

Tom P. wrote:

Anyway, you would only need nonblocking IO if you wanted to read
bits of the stderr stream before the command exited, but that
doesn’t sound like what you’re want.

Actually this is not correct: if there is a lot written to stderr
then you need to read that concurrently. If you do not do that
then the process will block on some stderr write operation that
fills up the pipe and you get a deadlock because your code waits
for process termination.

I guess I can see that, though I can’t think of a program that I’d
expect to be able to generate enough stderr output to clog a pipe.
In any case, my response would be to merge stdout and stderr, rather
than use non-blocking IO. If you’re just reading one stream while
the command is executing, you don’t need to worry about blocking.
I’m certainly with Ara in recommending that if you can avoid non-
blocking IO, you should.

At the risk of starting an unrelated discussion (“stderr considered
harmful”), my feeling has long been that stderr is misused by most
people, and that the only context in which it makes any sense is for
small commandline tools that you expect to use in a pipeline. For
apps like that, it’s helpful to keep error messages out of your
stdout stream. For most apps, however, I don’t think it makes any
sense to write error messages to a separate file. Error messages
should be written to the app’s main log file or output file, where
the user will be looking for their results. That way, nonfatal error
messages also appear naturally in the proper sequence with other
output. I work with a lot of scientist programmers who don’t think
much about issues like this and (typically) write their error
messages to stderr just because it’s there. I’m not sure that’s
relevant to the OP’s situation or not. (Probably not.)

Tom

robertjames · November 7, 2006, 10:52am

From: “Gabriele M.” removed_email_address@domain.invalid

you want to do something else while waiting for IO.

right. but ruby’s thread are green and, on windows, block the
entire process when waiting on IO so not an option here.

I thought Ruby internally uses non-blocking I/O in order to avoid
that a green thread reading something blocks every other thread: am I
wrong?
Or is this true just under unix?

Ruby uses select() internally, and Windows doesn’t support
select() on pipes, just sockets.

Regards,

Bill

robertjames · November 7, 2006, 10:53am

Tom P. wrote:

pipe and you get a deadlock because your code waits for process
termination.

I guess I can see that, though I can’t think of a program that I’d
expect to be able to generate enough stderr output to clog a pipe.

A typical pipe buffer size is 4k which can get filled pretty fast.

In
any case, my response would be to merge stdout and stderr, rather than
use non-blocking IO. If you’re just reading one stream while the
command is executing, you don’t need to worry about blocking.

Merging both from outside the subprocess is certainly possible from Ruby
(via a shell) but I am not sure, whether there is a portable solution
(one of the popenN methods?).

I’m
certainly with Ara in recommending that if you can avoid non-blocking
IO, you should.

I second that.

a lot of scientist programmers who don’t think much about issues like
this and (typically) write their error messages to stderr just because
it’s there. I’m not sure that’s relevant to the OP’s situation or not.
(Probably not.)

All true. But if you do not know what the program does or how it is
implemented you better deal with potential output to stderr (either by
merging, see above, or by making sure that stderr and stdout are read)
because otherwise the consequences might be somewhat catastrophic. And
this could also mean that your program at some point in the future
simply stops working because another piece of software has changed.

Kind regards

robert

robertjames · November 7, 2006, 10:53am

On Thu, 2 Nov 2006, Tom P. wrote:

get a deadlock because your code waits for process termination.
I guess I can see that, though I can’t think of a program that I’d expect to
be able to generate enough stderr output to clog a pipe.

did you check out my recent post (switched subjects) - it takes a
suprisingly
small amount (4242 lines of output does it easily)!

In any case, my response would be to merge stdout and stderr, rather than
use non-blocking IO. If you’re just reading one stream while the command is
executing, you don’t need to worry about blocking. I’m certainly with Ara
in recommending that if you can avoid non-blocking IO, you should.

no argument there… but

much about issues like this and (typically) write their error messages to
stderr just because it’s there. I’m not sure that’s relevant to the OP’s
situation or not. (Probably not.)

i’m in the same boat as you (writing for scientists) and have found it’s
nearly always the case that a program can produce something useful
in
stdout and therefore always log to stderr so that programs can be used
in
pipes.

mostly i agree though.

cheers.

-a

robertjames · November 7, 2006, 10:54am

you are comparing apples and oranges here. you have to look at where
these things fit into the big picture. the above mentioned libraries
provide abstractions that are used to provide a more convenient
programming model, and at least in the case of libevent, provide an
abstraction that insulates code from having to deal with supporting
the various readiness selection APIs that exist. they do not replace
the underlying OS interfaces upon which they are built.

JSR-000051 does more or less the same thing, but with a slightly
different goal in mind. the primary goal is to provide the programmer
with access to asynchronous network IO in a platform-agnostic
manner. the emphasis is on providing a sensible abstraction that works
well with the various operating system APIs that exist – exposing a
reasonable common subset of functionality that one can expect to be
able to support on a variety of platforms. JSR-000051 provides the
primitives on top of which you would implement IO frameworks that
provide conventient programming models.

for a language, providing sensible APIs for primitives is more
important than imposing a particular programming model. given a set
of sensible primitives that can be widely supported, whatever higher
level frameworks one wishes to create can be built atop that.

| in addition it’s a hack in the c lib of many oses.

I am not sure I understand exactly what you mean. the way I see it
there are three distinct levels to networking APIs:

OS API, that is, the system calls
standardized system libraries, like libc, java.io, java.util.nio
etc, which provide access to IO primitives and possibly abstract
away the underlying OS APIs
higher level IO frameworks that provide more convenient programming
models

libevent would overlap with both 2 and 3 in this case since its
mission is both to abstract away the underlying OS interfaces and
provide a convenient programming model. Java’s NIO is what you’d find
in 2 and a typical Reactor pattern implementation would be in 3.

(if you use low-level socket APIs (ie the types of functions
documented in section 2 of the UNIX manual pages) on UNIX you will
find yourself using a mix of 1 and 2 if you use C/C++ since you use
wrappers in libc to perform system calls, but this is just a very,
very thin convenience layer on top of the system calls. if this is
confusing then just forget I mentioned it :-).

| > for examples of use you might want to check out the Reactor pattern
| > and other patterns for concurrent programming.
|
| afaik the reactor pattern is a synchronous pattern

no, the Reactor pattern is mainly used to implement asynchronous IO
and is not really anything new. I’ve both written and seen variations
of this pattern in a multitude of languages since I started writing
networking code in the early 90s and the only thing that has really
changed is that we’ve gotten better at classifying these types of
patterns, give them better definitions, and give them names. (when I
started writing networking software in the early 90s, “patterns”
wasn’t commonly part of the programmers vocabulary).

| artima - Comparing Two High-Performance I/O Design Patterns
|
| not unlike the model of libevent and liboop - which are both synchronous…
| am i missing something?

you are missing an “a” in front of “synchronous”

-Bjørn

robertjames · November 7, 2006, 11:03am

On Sun, 5 Nov 2006, S. Robert J. wrote:

Windows as well?
check out systemu

(As an aside, kudos to the developers of popen4 - it’s really great.)

i am 99% positive that the implimentation of popen4 does not play well
with
windows and may be impossible to make it do so. the systemu package i
just
released is my attempt and an alternate implimentation. give it a while
and
let me know how it goes.

regards.

-a

robertjames · November 7, 2006, 11:02am

Yep, that’s how I came across this problem initially.

Doing that (replace ‘cat -’ with my command) hung indefinetly.
Commenting out the stderr line fixed it. I assume that it was waiting
for something to write to stderr before progressing.

Now, you are correct that the external process had terminated. Why
that didn’t close stderr and move on I do not know. More importantly -
is there a way to do what Tom is suggesting - that is, have Ruby move
on the second the external process terminates - that will work on
Windows as well?

(As an aside, kudos to the developers of popen4 - it’s really great.)

robertjames · November 7, 2006, 11:03am

removed_email_address@domain.invalid wrote:

it’s exactly things like eventmachine that make me say using nbio is archaic -
i don’t need to handle the complexities of nbio when powerful abstractions
like it exist!

The problem is that all of these “powerful abstractions” are
ridiculously slow compared to a well written nbio approach for many
types of applications. Particularly as long as Ruby’s threading is so
abysmal.

Try writing a network server that needs to handle a high number of
concurrent connections, and you’ll quickly find “select()” taking most
of your CPU if you use a model that makes use of threading and blocking
IO - your only real choice to get decent performance out of Ruby for
that kind of app is multiplexing the processing manually using nbio
(which is what Ruby is trying to do being the scenes, but fails
miserably at doing effectively once the number of threads gets high
enough) or fork instead which has it’s own problems if you need to
share significant state.

This is from personal experience - I currently have a guy on my team
rewriting an important backend process because we started running into
those exact issues.

Even when Ruby’s threading is sorted out so we won’t run into these
problems, nbio will be vital for high performance network programming -
well done nbio reduces the number of syscalls, and thereby context
switches enormously.

Vidar

robertjames · November 7, 2006, 11:03am

removed_email_address@domain.invalid wrote:

it can’t be done. search the archives, this sort of thing almost always
indicates a design flaw. for instance - what will your program do if there is
no input?

I have a messaging middleware server running right now that is
processing millions of messages a day. It’s written in Ruby, and does
all it’s work in a single thread using IO multiplexing with select().

Not only can it be done - it is fairly easy (~700 lines for the entire
app, including db persistence support etc.). Doing the same thing with
threads and blocking IO, on the other hand, would fall apart horribly
due to the way Ruby does threading. Forking also wouldn’t be an option
as the processes would still need to actually exchange those messages.

In fact, internally, Ruby does all it’s IO using non-blocking IO
exactly because the threading model would cause everything to block
otherwise. Incidentally that’s also one of the reasons why using
threads + blocking IO performs extremely badly in Ruby once the number
of threads gets beyond a certain level, because it causes Ruby to call
select() far too often.

Vidar

robertjames · November 7, 2006, 11:03am

From: “Vidar H.” removed_email_address@domain.invalid

removed_email_address@domain.invalid wrote:

it’s exactly things like eventmachine that make me say using nbio is archaic -
i don’t need to handle the complexities of nbio when powerful abstractions
like it exist!

The problem is that all of these “powerful abstractions” are
ridiculously slow compared to a well written nbio approach for many
types of applications. Particularly as long as Ruby’s threading is so
abysmal.

Sounds like you might want to actually take a look at eventmachine,
then.

Regards,

Bill