Automatic Scaling

nuby2ruby · October 22, 2007, 6:29pm

Hi,

I’ve got an app which will only be dealing with a few requests a
minute for most of the time, then will shoot up to a continuous 20
req/s for an hour at a time. We’ll potentially be running a lot of
instances of this app on the same server.

Is there any way to have additional instances of Mongrel be started
when the existing instance(s) stopping being able to handle the
volume of traffic. I could preload adequate instances to cope with
peak traffic but that will result in a lot of instances sitting idle
for the majority of the time.

Does anyone have any experience of SCGI? I know that FastCGI will
start loading additional app instances after a certain threshold -
does SCGI behave the same.

Thanks

Jeremy

nuby2ruby · October 22, 2007, 6:45pm

On 10/22/07, Jeremy W. [email protected] wrote:

I’ve got an app which will only be dealing with a few requests a
minute for most of the time, then will shoot up to a continuous 20
req/s for an hour at a time. We’ll potentially be running a lot of
instances of this app on the same server.

How fast is your app? How many mongrels do you figure you need to
handle that volume?

Is there any way to have additional instances of Mongrel be started
when the existing instance(s) stopping being able to handle the
volume of traffic. I could preload adequate instances to cope with
peak traffic but that will result in a lot of instances sitting idle
for the majority of the time.

I am working on something that will do exactly that. It’s not ready
for public consumption, but it will permit one to have a mongrel
cluster that self adjusts to the load it is receiving, with real time
reporting of cluster status.

Kirk H.

nuby2ruby · October 23, 2007, 1:00pm

On 22 Oct 2007, at 17:45, Kirk H. wrote:

On 10/22/07, Jeremy W. [email protected] wrote:

I’ve got an app which will only be dealing with a few requests a
minute for most of the time, then will shoot up to a continuous 20
req/s for an hour at a time. We’ll potentially be running a lot of
instances of this app on the same server.

How fast is your app? How many mongrels do you figure you need to
handle that volume?

Hopefully very fast - I’m aiming to get it down to 1 or 2 database
queries per (ajax) request, with just a small amount to text being
sent back. We’re currently planning this part of the so don’t have
any stats yet. My concern is that it will probably be being hosted on
our existing (heavily loaded) PHP server till the clients needs
enough instances to justify their own server.

reporting of cluster status.

Sounds very interesting - I’m pretty new to all this but this seems
to be one area where FastCGI (and I presume SCGI) has significant
advantage over the mongrel cluster approach. How long do you
anticipate it will take to develop your solution (just curious - I
know its a when its done thing).

Thanks

jebw

nuby2ruby · October 23, 2007, 6:29pm

Several years back I accidentally discovered that multiple processes can
listen on the same TCP/IP socket. The trick of course is that all the
processes are in the same process group, and the socket is opened by a
shared parent. The OS somehow was managing queuing up the various
calls to accept() on that socket. Since the watchdog parent / multiple
child servers is a common, this was a workable solution on the versions
of Linux we were using.

IIRC, the OS gracefully queued several processes’ calls to accept(),
requiring no additional synchronization. But, even if that weren’t the
case, there is still the option of puting an acceptor in a parent and
dispatching the client socket to available child servers.

Anyhow –

The application I wrote has a watchdog process in C that opens up a
server socket before forking child server processes. The children get
passed the descriptor number for that server socket as an argument on
their command lines.

All child server processes then enter an accept loop. They all call
accept() on that same, shared descriptor . Each child, btw, opened up
its own “private” admin socket on a port that was an offset of the main,
shared service port ( and optionally on a different interface as well ).

Within a pool then processes are somewhat self-balancing – a process
only calls accept() when it’s got threads available and ready to handle
a reqeust. Clients, or a client load balancer, don’t have to keep
track of traffic or request counts between individual server processes.
They also don’t have to try back-end app servers individually before
finding a “thread” that’s free – if any process in a pool is
available, it’s already sitting in accept() on that shared socket (
likely with others queued up behind it in their accept() calls).

If Mongrel’s suitable as an Internet-facing front-end, then there might
be, for many applications, no need for a load balancing proxy. Simply
fire up a pool of mongrels at port 80, and they’ll sort it all out among
themselves. Even for applications requiring multiple machines a scheme
like this would simplify load balancer proxy configurations (100
mongrels in a pool? No problem – all one port!)

I’m sure the folks who wrote Mongrel thought of this and either tried it
or rejected it beforehand for good reason. And I had the luxury of
coding for just one platform. Perhaps others impose hurdles that make
this impractical. But even there, isn’t there the Apache model of a
parent acceptor() passing client sockets to ready children?

Thoughts?

nuby2ruby · October 24, 2007, 5:04am

On 10/23/07, Robert M. [email protected] wrote:

requiring no additional synchronization. But, even if that weren’t the
All child server processes then enter an accept loop. They all call
available, it’s already sitting in accept() on that shared socket (
or rejected it beforehand for good reason. And I had the luxury of
coding for just one platform. Perhaps others impose hurdles that make
this impractical. But even there, isn’t there the Apache model of a
parent acceptor() passing client sockets to ready children?

Thoughts?

I believe that the main issue here is on the win32 platform, Luis?

We do have something similar in the works for a future release, however
I am
unsure as to how your suggestion ties in at the moment. It appears to be
well worth investigation for what we have planned.

Thank you kindly for this,

~Wayne

nuby2ruby · October 25, 2007, 10:07am

On 10/23/07, Robert M. [email protected] wrote:

Several years back I accidentally discovered that multiple processes
can
listen on the same TCP/IP socket. The trick of course is that all
the
processes are in the same process group, and the socket is opened by
a
shared parent. The OS somehow was managing queuing up the various
calls to accept() on that socket. Since the watchdog parent /
multiple
child servers is a common, this was a workable solution on the
versions
of Linux we were using.
[snip]
At 05:09 AM 10/24/2007, Wayne wrote:
~Wayne
Hi,

I thought I’d chime in as a Windows developer. I’m running Windows in
development mode and deploying to Linux (Ezra’s rig at EngineYard
actually).

If this idea actually works, it’s super appealing for me as a simple
solution to the “slow returning mongrel” load balancing challenge.
(i.e. the problems where mongrels are loaded up with requests
irrespective of their availability, leaving some mongrels idle and some
overloaded.)

I’ll give you a simple example: we have to build admin tools
periodically. It’s possible to make these tools spin off background
processes but that’s more time consuming to build and debug. Since
these tools are just basic admin utilities that we want to invest as
little time as possible in them: their initial functionality is 95% of
their overall value. So for example one of these tools lets us bulk
manage photos and can take 10 seconds to return. Because of
mongrel/nginx/load balance architecture, we can run into performance
trouble if we aren’t careful at what time of day we use the tool.

You can complain about my sloppy software architecture but I’ll
complain that I’m running a business and need to invest most in what
returns the most value to our customers. But I’d prefer to just agree
not to start that discussion. [smile]

If I understand what’s being discussed (not a given) the system Robert
is proposing would mean that a mongrel would only get requests from
port 80 when it was ready to call “receive()” again from the IP stack.
So mongrels would consume what they can off a common stack.

This would mean that I don’t have to balance all my Rails processes to
keep them returning results at roughly equal intervals. If my servers
are heavily loaded now, it seems like a slow returning mongrel can
cause havoc even if it’s just slow by a couple of seconds…

So given the mongrels a common pool of requests to consume from and
using such a low level pool such as an IP port seems great, and I’d
vote for it. The fact that my development box doesn’t run against the
same code is totally ok.

I’m sure Windows people who run in production will have other opinions
but this idea, if it’s practical, seems very elegant and useful. +1

Steve

nuby2ruby · October 25, 2007, 11:03am

Steve M. wrote:

irrespective of their availability, leaving some mongrels idle and some
overloaded.)

If you are using mod_proxy_balancer then I’m fairly sure that “the slow
returning mongrel” challenge has been solved using “max” and “acquire”
parameters to BalancerMember.

It’s working for me, and I tested using a pool of Nitro servers using
controller method that sleeps. I was hoping that Rafael G. would
report back with his results on this with is issue ( though with “conn
refused” I think his issue may be a little different ).

Here’s the product of my devinations and experiments:

<Proxy balancer://myserverpool> # no trailing slash
BalancerMember http://192.168.10.10:10000 keepalive=on max=1 lbset=0
acquire=1 timeout=1
BalancerMember http://192.168.10.10:10001 keepalive=on max=1 lbset=1
acquire=1 timeout=1
BalancerMember http://192.168.10.10:10002 keepalive=on max=1 lbset=0
acquire=1 timeout=1

If I understand what’s being discussed (not a given) the system Robert
is proposing would mean that a mongrel would only get requests from
port 80 when it was ready to call “receive()” again from the IP stack.
So mongrels would consume what they can off a common stack.

Yes. Minor nit – each process is in an accept() loop. Accept a
connection, process the request (it’s in processing that the read
occurs). After processing and then closing the socket to the client,
enter accept() to wait for another request.

This would mean that I don’t have to balance all my Rails processes to
keep them returning results at roughly equal intervals. If my servers
are heavily loaded now, it seems like a slow returning mongrel can
cause havoc even if it’s just slow by a couple of seconds…

Servers within the pool would auto-balance. You’d still have to balance
across pools if they’re running on multiple hosts. But even so, yes,
this is an improvement.

Swiftiply as I understand it also addresses the Mongrel/Rails request
queuing problem – I haven’t researched it yet. The max/acquire
settings are working for us, and I experimented with a sleeping
controller method to verify that it works.

So given the mongrels a common pool of requests to consume from and
using such a low level pool such as an IP port seems great, and I’d
vote for it. The fact that my development box doesn’t run against the
same code is totally ok.

Is there a cheap or demo version of Visual Studio I could try this on?
I’ve got XP Home Edition running under VMWare, and could code a simple
test to see if this works there. While I’m at it, I could verify that
this works on Mac OSX as well…

I’m sure Windows people who run in production will have other opinions
but this idea, if it’s practical, seems very elegant and useful. +1

Steve

It may be a horrendous hack for reasons I’m unaware of. But it worked,
and in a pretty demanding scenario, essentially unchanged for eight
years and counting ( ZDNet, then CNET ad servers ) so I didn’t research
it too deeply. I looked for problems like immediate error returns
from accept(), deadlocks, delays with growing queues of accept()
callers, but didn’t find any. I keep meaning to read through Linux’
TCP/IP stack to find out why it was so. Maybe there’s a non-standard
quirk in there that makes it all just dumb luck… darn… now I need to
know…

nuby2ruby · October 25, 2007, 12:51pm

Steve M. wrote:

overloaded.)

I found out why it works on Linux 2.6.20. In net/ipv4/tcp_ipv4.c
accept is mapped to inet_csk_accept in net/ipv4/inet_connection_sock.c.
That csk accept calls inet_csk_wait_for_connect to wait for a
connection. That wait for connect function locks the socket so that if
multiple processes are calling accept() on that socket, only one process
at a time will be awaken when a connection request arrives.

This wil require some change in Mongrel’s request loop. Mongrel
currently reenters accept even if it’s got a worker locking Rails. If
Mongrels in the pool are calling accept while still locking Rals then
we’re right back to square one, request queing. Mongrel would need to
either poll for worker list length == 0 before calling accept, or have a
“single-worker” mode in which it joins a worker thread as soon as it’s
started, and only calls accept() after returning from that join.

nuby2ruby · November 1, 2007, 9:17pm

On 10/23/07, Jeremy W. [email protected] wrote:

How fast is your app? How many mongrels do you figure you need to
handle that volume?

Hopefully very fast - I’m aiming to get it down to 1 or 2 database
queries per (ajax) request, with just a small amount to text being
sent back. We’re currently planning this part of the so don’t have
any stats yet. My concern is that it will probably be being hosted on
our existing (heavily loaded) PHP server till the clients needs
enough instances to justify their own server.

It’s hard to guage when you are sharing machine cycles with a heavily
loaded PHP app, but, depending on what those queries really do, 20 r/s
should be trivial to get with modest hardware and a tiny number of
mongrels (just a single mongrel is likely practical for a load that
low).

Sounds very interesting - I’m pretty new to all this but this seems
to be one area where FastCGI (and I presume SCGI) has significant
advantage over the mongrel cluster approach. How long do you
anticipate it will take to develop your solution (just curious - I
know its a when its done thing).

It’s part of the Swiftiply 0.7.0 feature set. I’m already late on
when I wanted to release it, but realistically, it’s probably another
month or so away.

Kirk H.