Asynchronous HTTP request

Daniel_DeLorme · May 19, 2010, 3:57am

Daniel N wrote:

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }

What context are you trying to do this in? Is it inside a rack request
(rails / merb / sinatra /pancake / other)? or is this in a stand alone
script?

Could you perhaps provide a bit of context for what you’re trying to achive?

Ok, here’s the context. I didn’t put this in my OP because I figured
it would just bore everybody to tears.

This is inside a rack request. The idea is that I’m assembling a web
page by doing a bunch of sub-requests for the various parts of the
page. So I’ll have something like:

action “index” do
@news = subreq(“http://news.server”)
@ad = subreq(“http://ad.server”)
@blog = subreq(“http://blog.server”)
@forum = subreq(“http://forum.server”)
end

All these sub-requests are launched asynchronously and, while they are
executing, the app generates the layout within which the output of the
subrequests will be embedded. So I’ll have something like:

response = [‘’,
‘

’,@ad,‘

’,
‘

’,@news,‘

’,
‘

’,@blog,‘

’,
‘

’,@forum,‘

’,
‘’]

And when rack finally outputs the response to the client it will block
on the various subrequests unless/until they have completed.

What I can’t figure out with EventMachine is how to have the “main
thread” generate the layout while the subrequests are executing.

Daniel_DeLorme · May 19, 2010, 4:54am

On 19 May 2010 11:56, Daniel DeLorme [email protected] wrote:

with

Could you perhaps provide a bit of context for what you’re trying to
action “index” do
response = [‘’,
thread" generate the layout while the subrequests are executing.

Ok now we’re talking. So with rack you can’t do true async with a
callback.
Rack is callstack based, meaning that you have to return the value as
the
response the the .call method on your application. This means that any
callback based async actually needs to block in order for the rack
application you’re in to return it’s result. You could do it by
returning
a custom object in the rack response that renders as much as possible
while
it waits for the response, and then renders that when it can, but that
option may not be available depending on what framework you’re using.

There are a couple of other things that could help you here that
immediately
come to mind.

You can use GitHub - pauldix/typhoeus: Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic. which can fetch all the
resources in parallell, and then block until all the responses come in.
This is probably going to be relatively easy to implement, and means
that
the total request time for the resources is only very slightly higher
than
the longest single response.

You can use ESI. There’s an esi for rack project on github by Joshua
Hull
which could be useful to you. GitHub - joshbuddy/esi-for-rack: ESI implementation for Rack
You
can also use esi outside of the rack request in apache or nginx, by
responding first with a layout file containing esi tags pointing to the
content to use. Ngins, Apache or the esi rack project can then assemble
the
page for you using the resources specified.

Alternatively if you’re not married to hard to rack, you can take a look
at
something like cramp, or node.js for a true async server environment.

HTH
Daniel

Daniel_DeLorme · May 19, 2010, 6:30am

Tony A. wrote:

On Tue, May 18, 2010 at 7:56 PM, Daniel DeLorme [email protected] wrote:

What I can’t figure out with EventMachine is how to have the “main
thread” generate the layout while the subrequests are executing.

The problem here is inversion of control. EventMachine inverts control on
you, and it sucks. You can’t just do subreq(…) and expect it to return a
value. In the best case, you have subreq call a block when it completes.
The familiar pattern of “call function, get value” no longer applies.

Sorry if I’m being redundant, but I wanted to point out that
EventMachine
can support non-inverted control semantics, with a Fiber-based wrapper
layer.

For example, the following is an excerpt from a test for a
single-threaded
EventMachine application. Most of the methods below are being invoked
on
remote server(s), but nothing ever blocks the thread. (Other Fibers
on the same EM thread will still be responding to UI events, etc.)

def test_add_documents_to_catalog
@app.reset_to_init_state

catalog1_path = @app.testsv_uri +

URI.encode(“/@default-catalog-path/catalog1”)
catalogs = @app.catalog_manager
cat = catalogs.open_catalog(catalog1_path, :delete_existing=>true)

num_docs = cat.query_num_documents
assert_equal(0, num_docs)

catalogs.active_catalog = cat
assert_equal( cat, catalogs.active_catalog )

doc_paths = imageset_paths(1) + imageset_paths(2)
per_dir_doc_paths = @app.partition_filelist_per_directory(doc_paths)
records =

@app.fetch_metadata_for_partitioned_filelist(per_dir_doc_paths)
assert_equal( doc_paths.length, records.length )

# run the 'store' test twice, to make sure the
# "INSERT OR REPLACE" is working...
2.times do
  cat.store_document_metadata(records)
  num_docs = cat.query_num_documents
  num_docs_expected = doc_paths.length
  assert_equal(num_docs_expected, num_docs)
end

# try some searches
paths = cat.search("caption" => "World Series")
assert_equal( 1, paths.length )
assert_equal( imageset_paths(2)[0], paths[0] )

# etc.

end

Anyway, dunno if this adds anything to the topic. Apologies if not.

Regards,

Bill

Daniel_DeLorme · May 19, 2010, 4:46am

On Tue, May 18, 2010 at 7:56 PM, Daniel DeLorme [email protected]
wrote:

What I can’t figure out with EventMachine is how to have the “main
thread” generate the layout while the subrequests are executing.

The problem here is inversion of control. EventMachine inverts control
on
you, and it sucks. You can’t just do subreq(…) and expect it to
return a
value. In the best case, you have subreq call a block when it
completes.
The familiar pattern of “call function, get value” no longer applies.

Daniel_DeLorme · May 19, 2010, 7:26am

On Tue, May 18, 2010 at 10:26 PM, Bill K. [email protected] wrote:

Sorry if I’m being redundant, but I wanted to point out that EventMachine
can support non-inverted control semantics, with a Fiber-based wrapper
layer.

And I’m pretty sure I was the first person to ever implement a Ruby
Fiber-based wrapper which provides normal flow control semantics on top
of
an IoC-driven event-based framework with Revactor, for what it’s worth.

Even so, there’s been little success in actually applying that to an
asynchronous HTTP framework. Cramp and Rainbows are all that come to
mind.
Although Revactor did support concurrent I/O alongside HTTP request
processing with Mongrel.

Daniel_DeLorme · May 19, 2010, 9:20am

Daniel DeLorme wrote:

I didn’t post that code without testing it. If you look at it a bit more
carefully maybe you’ll understand how it works. The HTTP request is sent
after the first Fiber#resume but the fiber yields before attempting to
read the response.

Oh yes, sorry about that. I’d digested one of the method_missing
sections but not the other.

It still seems like unnecessary complexity when ruby threads are cheap,
but it achieves what you want.

Daniel_DeLorme · May 19, 2010, 1:29pm

Daniel N wrote:

@ad = subreq(“http://ad.server”)
‘

’,@news,‘
’,

Ok now we’re talking. So with rack you can’t do true async with a callback.

Ah, but I never really wanted callbacks; that would be evented IO. The
various approaches to asynchronous IO are not terribly well defined, but
what I meant by nonblocking was simply:
resource.send_request #=> doesn’t block waiting for response!
resource.get_response
This is not possible with Net::HTTP because those two phases of the http
request are bound into one monolithic get(url) operation.

callback based async actually needs to block in order for the rack
application you’re in to return it’s result. You could do it by returning
a custom object in the rack response that renders as much as possible while
it waits for the response, and then renders that when it can, but that
option may not be available depending on what framework you’re using.

I guess I was not clear enough, but this approach is exactly what I
tried to explain above.

You can use ESI. There’s an esi for rack project on github by Joshua H.
which could be useful to you. GitHub - joshbuddy/esi-for-rack: ESI implementation for Rack You
can also use esi outside of the rack request in apache or nginx, by
responding first with a layout file containing esi tags pointing to the
content to use. Ngins, Apache or the esi rack project can then assemble the
page for you using the resources specified.

Oh wow this was really interesting. This sent me on an hours-long
exploration of Varnish+ESI and Nginx+SSI. It turns out that Nginx will
fetch the subrequests in parallel but Varnish (caching proxy) will not.
This is probably fine if most of the subrequests are already cached
(which admittedly is the point of a caching proxy) but if not… Nginx
is the winner.

This opens a lot of possibilities. For example I can imagine serving up
simple “skeleton” pages with heavy caching and then generate all the
user-specific parts through SSI.

Daniel_DeLorme · May 19, 2010, 5:36pm

Tony A. wrote:

On Tue, May 18, 2010 at 10:26 PM, Bill K. [email protected] wrote:

Sorry if I’m being redundant, but I wanted to point out that EventMachine
can support non-inverted control semantics, with a Fiber-based wrapper
layer.

And I’m pretty sure I was the first person to ever implement a Ruby
Fiber-based wrapper which provides normal flow control semantics on top
of
an IoC-driven event-based framework with Revactor, for what it’s worth.

Yeah I was going to mention revactor as well.

There is “async sinatra” if that’s any help.