Asynchronous HTTP request

Daniel_DeLorme · May 13, 2010, 9:37am

Does anyone know how to do the following, but without threads, purely
with asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("", website.value)

I’m not sure I understand EventMachine, but it doesn’t seem like this
code fits with the “event loop” model. Besides, I don’t want to react to
every chunk of data that comes it; I just want the result at the end.

Thanks.

Daniel_DeLorme · May 13, 2010, 12:36pm

Daniel DeLorme wrote:

Does anyone know how to do the following, but without threads, purely
with asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("", website.value)

Depends what you mean by “with asynchronous IO”. Do you want to keep
calling select() and then only read data when its available? Then you’re
basically rewriting eventmachine or io-reactor.

Otherwise, you can do res.read_body with a block - it will be called for
each chunk. But read_body will still block until the body is complete.

I’m not sure I understand EventMachine, but it doesn’t seem like this
code fits with the “event loop” model. Besides, I don’t want to react to
every chunk of data that comes it; I just want the result at the end.

But if you don’t want the code to block until the body has read, but you
don’t want the read to take place in another thread, then what do you
want?

What’s the problem with threads anyway? Being able to do one thing while
you’re waiting for something else to complete is exactly what they’re
for.

Daniel_DeLorme · May 13, 2010, 3:29pm

Brian C. wrote:

basically rewriting eventmachine or io-reactor.
I mean nonblocking. I don’t want to keep calling select(), I just want
to call it once, when I’m ready to process the data I asked for.

But if you don’t want the code to block until the body has read, but you
don’t want the read to take place in another thread, then what do you
want?

I just want to issue the http request, do other stuff while the request
goes on its merry way, let the response accumulate at the socket, and
read the data when I’m ready to. If at that time the response has
accumulated at the socket then I don’t have to wait, otherwise block
until the data has finished coming in.

What’s the problem with threads anyway? Being able to do one thing while
you’re waiting for something else to complete is exactly what they’re
for.

I can’t agree with that. Thread are meant to achieve concurrency,
meaning the concurrent (or at least interleaved) execution of
instructions. If the only thing the IO thread does is wait for the data
and then exit, there’s nothing concurrent happening; it’s just a way to
simulate nonblocking IO. And creating a thread just for that seems to me
like the proverbial jackhammer to drive in a nail, especially since 1.9
threads are no longer green.

Given that nonblocking IO is a paradigm that’s been around for ages I
was kinda hoping there was a neat & tidy way of doing it (a gem maybe?)
but I haven’t found it.

Daniel_DeLorme · May 13, 2010, 4:07pm

Daniel DeLorme wrote:

I just want to issue the http request, do other stuff while the request
goes on its merry way, let the response accumulate at the socket, and
read the data when I’m ready to.

Hmm. Well you can delay reading the body like this:

http = Net::HTTP.start(…)
res = http.get(…)
… do some stuff
answer = res.read_body

but it’ll wait for the response headers before get() returns. So, you
should just pick the bits you need out of /usr/lib/ruby/1.8/net/http.rb
directly.

Note that get() just calls request(Get.new(…)), which takes you here:

def request(req, body = nil, &block)  # :yield: +response+
  unless started?
    start {
      req['connection'] ||= 'close'
      return request(req, body, &block)
    }
  end
  if proxy_user()
    unless use_ssl?
      req.proxy_basic_auth proxy_user(), proxy_pass()
    end
  end

  req.set_body_internal body
  begin_transport req
    req.exec @socket, @curr_http_version, edit_path(req.path)
    begin
      res = HTTPResponse.read_new(@socket)
    end while res.kind_of?(HTTPContinue)
    res.reading_body(@socket, req.response_body_permitted?) {
      yield res if block_given?
    }
  end_transport req, res

  res
end

You can see there how to send the request (req.exec), and subsequently
how to read the response from the socket.

Daniel_DeLorme · May 13, 2010, 3:46pm

On Thu, 2010-05-13 at 22:29 +0900, Daniel DeLorme wrote:

threads are no longer green.
I think you are trying to save the computer time at the expense of your
own. What you are asking to do is reimplement something that Thread
does very well, and without excessive resource usage. Even ruby1.8 can
run thousands of sleepy I/O threads without a problem. The reason you
can’t find another library for doing what you want is that everyone uses
threads. If you really want a jackhammer, use EventMachine

Daniel_DeLorme · May 15, 2010, 8:38am

So you didn’t want a Thread, but you’ll happily use a Fiber…

Daniel_DeLorme · May 15, 2010, 4:20am

Brian C. wrote:

Hmm. Well you can delay reading the body like this:

http = Net::HTTP.start(…)
res = http.get(…)
… do some stuff
answer = res.read_body

but it’ll wait for the response headers before get() returns. So, you
should just pick the bits you need out of /usr/lib/ruby/1.8/net/http.rb
directly.

Thanks for your answer. It was a bit more low-level than I would’ve
liked to, but it helped me get the creative juices flowing. In the end
my solution involved wrapping the request’s socket in a Fiber. Quite a
monkeypatch perhaps, but it seems to work:

class ASync
class ASocket < BasicObject
def initialize(socket)
@socket = socket
end
def method_missing(name, *args, &block)
::Fiber.yield if name =~ /read/
@socket.send(name, *args, &block)
end
end
def initialize(uri, headers={})
uri = URI.parse(uri) unless uri.is_a?(URI)
@fiber = ::Fiber.new do
Net::HTTP.start(uri.host, uri.port) do |http|
http.instance_eval{ @socket = ASocket.new(@socket) }
@response = http.get(uri.request_uri, headers)
end
end
@fiber.resume #send the request
end
def method_missing(*args, &block)
@fiber.resume until @response
@response.send(*args, &block)
end
end

Daniel_DeLorme · May 17, 2010, 4:59pm

I’m not sure I understand EventMachine, but it doesn’t seem like this
code fits with the “event loop” model. Besides, I don’t want to react to
every chunk of data that comes it; I just want the result at the end.

It might fit well. Give it a shot.

http://eventmachine.rubyforge.org/EventMachine/Protocols/HttpClient.html

-rp

Daniel_DeLorme · May 15, 2010, 9:43am

Brian C. wrote:

So you didn’t want a Thread, but you’ll happily use a Fiber…

Well, yes, a Fiber is just a coroutine, nothing like a thread.

Daniel_DeLorme · May 18, 2010, 5:41am

On 13 May 2010 17:37, Daniel DeLorme [email protected] wrote:

Thanks.

Event machine is perfect for this kind of stuff. Weather it fits with
the
rest of your web framework is more likely the thing that makes it an
unlikely selection (if you’re using anything rack based for example)

http://eventmachine.rubyforge.org/EventMachine/Protocols/HttpClient.html
or
http://eventmachine.rubyforge.org/EventMachine/Protocols/HttpClient2.html

Give an example of how this is used.

HTH
Daniel

Daniel_DeLorme · May 18, 2010, 6:19am

On Thu, May 13, 2010 at 1:37 AM, Daniel DeLorme [email protected]
wrote:

Does anyone know how to do the following, but without threads, purely with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub(“”, website.value)

Looks like you want futures, which can be provided by any number of
frameworks. A pretty awesome one to consider is dataflow, which is
based
off ideas from the Oz language:

MenTaLguY’s Omnibus library also provides futures, however I don’t
believe
it’s presently maintained:

http://rubyforge.org/projects/concurrent

Daniel_DeLorme · May 18, 2010, 5:17am

You could check out GitHub - pauldix/typhoeus: Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic.

It allows you to create parallel HTTP requests pretty easily.

Daniel_DeLorme · May 18, 2010, 7:05am

There’s a library I wrote specifically for external http requests in
mind.
It uses threads and blocks on first access (no callbacks).

m = Muscle.new do |m|
m.action(:users) do
# get users from an external service
end

m.action(:slow_stuff, :timeout => 1.2) do
  # some unreliable action.
end

# Setup a special timeout handler for the second action
# by default timeouts are set to 5 seconds
m.on_timeout(:another) do
  "Sorry but :action timed out"
end

end

m[:users] # blocks when accessed until m[:users] action is completed.
Continues the remaining actions in the background

Not sure if it helps your situation, but it’s simple and works
effectively.

Daniel_DeLorme · May 18, 2010, 8:57am

Daniel N wrote:

rest of your web framework is more likely the thing that makes it an
unlikely selection (if you’re using anything rack based for example)

If you could show me how to use EventMachine in this case I’d be
grateful. I couldn’t figure out how to run compute_lots_of_stuff() while
the http requests are executing.

Daniel_DeLorme · May 18, 2010, 9:34am

Daniel N wrote:

There’s a library I wrote specifically for external http requests in mind.
It uses threads and blocks on first access (no callbacks).

GitHub - hassox/muscle: A simple parallel execution library

Except that’s pretty much the same thing as my original example. And
indeed it’s simple and effective. It’s just that I happen to like the
concept of asynchronous IO so I would like to do it that way if
possible.

On 18 May 2010 14:19, Tony A. [email protected] wrote:

Looks like you want futures, which can be provided by any number of
frameworks. A pretty awesome one to consider is dataflow, which is based
off ideas from the Oz language:

GitHub - larrytheliquid/dataflow: Dataflow concurrency for Ruby (inspired by the Oz language)

Oooh, that’s a pretty nifty concept… but again it relies on threads
to take care of the concurrency, which I already knew how to do.

Daniel_DeLorme · May 18, 2010, 10:48am

On Tue, May 18, 2010 at 1:33 AM, Daniel DeLorme [email protected]
wrote:

GitHub - larrytheliquid/dataflow: Dataflow concurrency for Ruby (inspired by the Oz language)

Oooh, that’s a pretty nifty concept… but again it relies on threads
to take care of the concurrency, which I already knew how to do.

That’s one way of looking at it. However, Erlang ultimately relies on
threads to take care of concurrency as well, however Erlang’s
concurrency
model can do a lot better job of managing those threads than you ever
can.
The same goes with dataflow.

Daniel_DeLorme · May 18, 2010, 6:00pm

Daniel DeLorme wrote:

Brian C. wrote:

So you didn’t want a Thread, but you’ll happily use a Fiber…

Well, yes, a Fiber is just a coroutine, nothing like a thread.

Except that the semantics of Threads are well defined. You start them,
they do stuff, you join them.

Are you saying that a Fiber will return control to you when it blocks
due to lack of data on a socket, as well as when the Fiber explicitly
“yields”? What value does it return to you in the blocking case?

Testing suggests otherwise.

$ cat ert.rb
p1, p2 = IO.pipe

f = Fiber.new do
puts “Starting fiber”
p1.gets
puts “Ending fiber”
end

sleep 0.5
puts “Point A”
f.resume
puts “Point B”

$ ruby19 ert.rb
Point A
Starting fiber

As far as I can see: the fiber starts processing when f.resume is
called, but blocks when p1.gets is called.

So AFAICS, your code which thinks it can do work while the the HTTP
request is running, doesn’t. Rather, the HTTP request is not sent at all
until the Fiber#resume is called, and at that point it will block as
necessary until the whole response is received.

Daniel_DeLorme · May 18, 2010, 10:03pm

Brian C. wrote:

Are you saying that a Fiber will return control to you when it blocks
due to lack of data on a socket, as well as when the Fiber explicitly
“yields”? What value does it return to you in the blocking case?

As an aside, Fibers can behave in that fashion when used within a
“never block” architecture like EventMachine.

There’s also the neverblock library, which employs Fibers similarly:
http://www.espace.com.eg/neverblock

In my case, using a homegrown RPC library with Fibers, on top of
EventMachine, a method call on a ‘remote’ object suspends the Fiber
until the response is received:

result = remote_object.fornstaff(“dreelsprail”)

So, a method call on remote_object explicitly yields (actually,
using Fiber#transfer) behind the scenes so that other fibers may run
in the interim.

It’s interesting, so far, as a programming model. Since there’s
only a single thread, there’s no need for traditional concurrency
primitives like Mutexes.

On the other hand, reentrancy is still an issue. So if one is in
the process of modifying the state of an object, one does indeed
need to be aware when a method call might end up yielding the fiber.

…Highlighting the usefulness of approaches where “variables have
the property that they can only be bound/assigned to once” like
the dataflow library mentioned by Tony A. elsewhere in this
thread.

Regards,

Bill

Daniel_DeLorme · May 19, 2010, 1:30am

On 18 May 2010 16:56, Daniel DeLorme [email protected] wrote:

puts template.sub(“”, website.value)

What context are you trying to do this in? Is it inside a rack request
(rails / merb / sinatra /pancake / other)? or is this in a stand alone
script?

Could you perhaps provide a bit of context for what you’re trying to
achive?

Cheers
Daniel

Daniel_DeLorme · May 19, 2010, 1:29am

Brian C. wrote:

Daniel DeLorme wrote:

Brian C. wrote:

So you didn’t want a Thread, but you’ll happily use a Fiber…
Well, yes, a Fiber is just a coroutine, nothing like a thread.

Are you saying that a Fiber will return control to you when it blocks
due to lack of data on a socket, as well as when the Fiber explicitly
“yields”? What value does it return to you in the blocking case?

Given that I just said a Fiber is nothing like a thread, I’m not sure
how you got the idea that I’m saying Fibers behave like threads (yield
control on IO)

So AFAICS, your code which thinks it can do work while the the HTTP
request is running, doesn’t. Rather, the HTTP request is not sent at all
until the Fiber#resume is called, and at that point it will block as
necessary until the whole response is received.

I didn’t post that code without testing it. If you look at it a bit more
carefully maybe you’ll understand how it works. The HTTP request is sent
after the first Fiber#resume but the fiber yields before attempting to
read the response.