Fair Proxy Balancer

Hello,

In case people haven’t seen it:
Ruby on Rails Blog / What is Ruby on Rails for?

“So EngineYard.com put out a bounty on getting a fair proxy balancer
that would keep track of when a mongrel is busy and not send requests
to it until it has finished and is ready to take requests. This means
that now we can effectively queue inside nginx very efficiently and
only send requests to non busy mongrels. Thereby avoiding the waiting
in the wrong line for service problem described above.”

I am guessing that the strategy is at the network level, so it should
work with non-mongrel back-ends.

Am 22.11.2007 um 22:38 schrieb Adam Z.:

only send requests to non busy mongrels. Thereby avoiding the waiting
in the wrong line for service problem described above."

I am guessing that the strategy is at the network level, so it should
work with non-mongrel back-ends.

Great news, this will be a huge step also for the case of Zope zeo-
deployments. One question, how is the busy state determined? In case
of zeo each backend client can take some defined number of requests
in parallel, how is such a case handled? (Not knowing mongrels, are
they taking only one request each?)

With regards,

__Janko H.

On 11/22/07, Janko H. [email protected] wrote:

Am 22.11.2007 um 22:38 schrieb Adam Z.:

In case people haven’t seen it:
http://www.brainspl.at/articles/2007/11/09/a-fair-proxy-balancer-
for-nginx-and-mongrel

Great news, this will be a huge step also for the case of Zope zeo-
deployments.

Great news indeed. We recently switched from Lighttpd to Nginx using
this patch on a live site (20 mongrels across two boxes, load average
~3.0), and it seems completely stable.

One question, how is the busy state determined? In case
of zeo each backend client can take some defined number of requests
in parallel, how is such a case handled?

I have not studied the sources, but I expect it will pick the upstream
with the fewest number of current pending requests; among upstreams
with the same number of concurrent requests, the one picked is
probably arbitrary.

(Not knowing mongrels, are
they taking only one request each?)

Mongrel supports multiple concurrent requests, but uses Ruby’s green
threads (which I like to think are closer to yellow and shaped like a
banana) to process them.

The Mongrel Rails dispatcher uses a single process-wide lock around
the dispatching logic, meaning that a single Mongrel-Rails process can
only process on request at a time.

Alexander.

On Nov 22, 2007, at 2:16 PM, Janko H. wrote:

to it until it has finished and is ready to take requests. This means
in parallel, how is such a case handled? (Not knowing mongrels, are
they taking only one request each?)

With regards,

__Janko H.

Mongrels are single threaded when running rails so this fair balancer
tries to only send requests to mongrels that are not currently serving
requests. But once you surpass all the backends being busy it uses
some nice weighting algorithms to try and still serve requests to the
least congested backends. there is some nice rbtree and scheduling/
weighting code in there now as well. The module works with 0.5.x and
0.6.x and once it is completely stable we plan to offer it to Igor to
see if he wants to include it in the main distro.

There was just a new push of code so if anyone wants to play you can
go here:

http://git.localdomain.pl/?p=nginx.git;a=tree;hb=upstream_fair-0.6

And click on snapshot to grab the latest snapshot. Please play with
this and report any weird results or problems you experience so we can
improve the module.

I am already using this in a number of production sites ans it works
great, a huge improvement for rails apps running on nginx + mongrel.
Of course this is still alpha stuff so don’t put it into mission
critical production setups yet> But in general and I very happy with
the stability and performance of this new module.

This new module will actually work for any http backends and is not
directly tied to mongrel at all so this is probably good for other non
rails backends as well. Please test your setups with this module.

Grezegorz has done an awesome job on this and should be commended.

Cheers-

On 22.11.2007, at 22:38, Adam Z. wrote:

in the wrong line for service problem described above."

I am guessing that the strategy is at the network level, so it should
work with non-mongrel back-ends.

good! that means we can get rid of our LVS. i can really agree that
least-open-connections load balancing adds a lot of speed.

jodok


Adam
[email protected]


Lovely Systems, Partner

phone: +43 5572 908060, fax: +43 5572 908060-77
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria

Hi,

2007/11/23, Alexander S. [email protected]:

One question, how is the busy state determined? In case
of zeo each backend client can take some defined number of requests
in parallel, how is such a case handled?

Should work out of the box, distributing the load equally. You may
wish to specify weight for each backend but if all are equal, this
should have no effect.

I have not studied the sources, but I expect it will pick the upstream
with the fewest number of current pending requests; among upstreams
with the same number of concurrent requests, the one picked is
probably arbitrary.

The scheduling logic looks like this:

  • The backends are selected mostly round-robin (i.e. if you get 1
    req/hour, they’ll be serviced by successive backends)
  • Idle (no requests currently serviced) backends have absolute
    priority (an idle backend will be always chosen if available)
  • Otherwise, the scheduler walks around the list of backends
    (remembering where it finished last time) until the scheduler score
    stops increasing. The highest scored backend is chosen (note: not all
    backends are probed, or at least not always).
  • The scheduler score is calculated roughly as follows (yes, it could
    be cleaned up a little bit):

score = (1 - nreq) * 1000 + last_active_delta;
if (score < 0) {
score /= current_weight;
} else {
score *= current_weight;
}

nreq is the number of currently processed requests
last_active_delta is time since last request start or stop (serviced
by this backend), in milliseconds
current_weight is a counter decreasing from the backend’s weight to 1
with every serviced request

It has a few properties which (I hope) make it good:

  • penalizing busy backends, with something like a pessimistic
    estimate of request time
  • rewarding backends which have been servicing a request for a long
    time (statistically they should finish earlier)
  • rewarding backends with higher weight more or less proportionally.

Please give the module a try and report any issues you might find.

Best regards,
Grzegorz N.

Thanks for the explanation. I will give it a try shortly.

With regards,

__Janko

Am 23.11.2007 um 13:38 schrieb Grzegorz N.:

Hi. It has been a while since the introduction of fair proxy balancer.
How stable is it for production use. I was looking at potentially using
haproxy or lvm but hoping the new balancer is stable enough since I
don’t want to unnecessarily complicate things with even more layers of
software in the stack. Anyone using it for production that can comment.

Regards,
David

On Jan 31, 2008 4:27 AM, David P. [email protected] wrote:

Hi. It has been a while since the introduction of fair proxy balancer.
How stable is it for production use. I was looking at potentially using
haproxy or lvm but hoping the new balancer is stable enough since I
don’t want to unnecessarily complicate things with even more layers of
software in the stack. Anyone using it for production that can comment.

We have been running the fair proxy balancer patch in production since
November. We had a problem with an earlier version of the patch in
combination with long-running requests (if you’re interested, search
the list archives for my post), but the current one solves this issue.
Other than that it’s been smooth sailing.

Alexander.

Alex the patch for the bug you mention is it part of the latest nginx
and how to use the fair proxy balancer
thanks a lot

Alexander S. wrote:
On Jan 31, 2008 4:27 AM, David P. <[email protected]> 
wrote:
  
Hi. It has been a while since the introduction of fair 
proxy balancer.
How stable is it for production use. I was looking at potentially using
haproxy or lvm but hoping the new balancer is stable enough since I
don't want to unnecessarily complicate things with even more layers of
software in the stack. Anyone using it for production that can comment.
    
We have been running the fair proxy balancer patch in production since
November. We had a problem with an earlier version of the patch in
combination with long-running requests (if you're interested, search
the list archives for my post), but the current one solves this issue.
Other than that it's been smooth sailing.

Alexander.

On Jan 31, 2008 12:07 PM, Mark [email protected] wrote:

Alex the patch for the bug you mention is it part of the latest nginx
and how to use the fair proxy balancer

The module is not part of Nginx. See
http://wiki.codemongers.com/NginxHttpUpstreamFairModule.

Alexander.

Hi Alexander. This is great. BTW, what would you recommend for
monitoring connections to see that it is fairly balancing without idle
connections or directing to a potentially blocked connection due to long
running requests. Many thanks.

Regards,
David

Hi Igor. Since the fair proxy balancer appears to be doing the right
thing, is there a stopper in bringing the necessary changes into the
trunk (without patching) so it would only require the module? Many
thanks.

Regards,
David

On Jan 31, 2008 2:23 PM, David P. [email protected] wrote:

Hi Alexander. This is great. BTW, what would you recommend for
monitoring connections to see that it is fairly balancing without idle
connections or directing to a potentially blocked connection due to long
running requests. Many thanks.

The number of “writing” connections (seen according to the stub status
module) will in part reflect the number of connections blocking.

Other than that, we’re using Mongrel with my process title
monkeypatcher (http://purefiction.net/mongrel_proctitle/), which
changes the Unix process title to reflect the number of queued
requests.

Alexander.

On Thu, Jan 31, 2008 at 09:31:29AM -0400, David P. wrote:

Hi Igor. Since the fair proxy balancer appears to be doing the right
thing, is there a stopper in bringing the necessary changes into the
trunk (without patching) so it would only require the module? Many thanks.

Hi,

The fair proxy balancer doesn’t really need any patches in nginx core. I
have published it as a complete source repository for my own convenience
but One Day™ I’ll publish it as a standalone module.

The code is self-contained in src/http/modules/ngx_http_upstream_fair.c
so if you wire it into the nginx build process any other way, it should
work fine.

There are a few functions which are generic enough to warrant inclusion
in nginx core (and converting nginx code to use it) but by no means is
it neccessary for the module to function.

One known issue (again, waiting for the One Day) is that the round-robin
part doesn’t work too well. E.g. if your load is very low, all requests
will go to the first backend. Anyway, I’ll keep the current behaviour as
an option as it may be useful in dimensioning your backend cluster (i.e.
if the Nth backend has serviced no requests, N-1 should be enough).

Best regards,
Grzegorz N.

Hi Grzegorz. I appreciate your explanation. It would be more convenient
to compile as an option since I am using an automated build process. If
it is self contained, can you forsee any problems building with most
current 0.5.x branch or is this strictly 0.6.x? Also, what is the
request threshold that triggers the issue with round robin issue that I
am aware. Many thanks.

Regards,
David

On Thu, Jan 31, 2008 at 01:29:32PM -0400, David P. wrote:

Hi Grzegorz. I appreciate your explanation. It would be more convenient
to compile as an option since I am using an automated build process. If
it is self contained, can you forsee any problems building with most
current 0.5.x branch or is this strictly 0.6.x? Also, what is the
request threshold that triggers the issue with round robin issue that I
am aware. Many thanks.

The module works with 0.5.x as well as 0.6.x (if it doesn’t work for
you, please mail me with a bug report).

There’s no threshold per se, it’s just that the original load balancer
directs requests strictly round robin, i.e. 0-1-2-3-0-1-2-3 etc. This
ensures that every backend gets the same number of requests.

upstream_fair always starts from backend 0 and works its way up until it
finds an idle peer (more or less). If your load effectively uses a
single backend at one time, it’ll always be backend 0. If it uses the
power of two backends, they’ll be 0 and 1 etc. Thus the first backend
will always have the most requests served, the second one will have more
than the third etc.

Best regards,
Grzegorz N.

Hi Grzegorz. This gives me a much better idea of what to expect. Thank
you for this. I am curious whether you have you done anything in the way
of comparing the effectiveness of the fair proxy balancer to other
balancing schemes like haproxy or lvm. Speed is a big factor for
deployments so hoping speed will be good with the simplicity that this
option presents. Many thanks.

Regards,
David

yes it l b great if fair proxy balancer is added to nginx trunk

Hey, sorry for the late response here, but I thought I should mention,
we’re
using the fair proxy balancer on a rails site that averages over 100
million
hits/month. Been using it for about a month now, and love it.

Also, to Alexander, our programmer especially wanted me to thank you for
coming up with that mongrel process title patch, that thing is awesome!
:slight_smile:

-Rob.