Dynamically generating 10k pages per second

On 7/31/06, Francis C. [email protected] wrote:

You’re not defining what a “burst” is.

Perhaps I’m using the term incorrectly. “An abrupt, intense increase;
a rush: a burst of speed; fitful bursts of wind.” is how I mean it.

emerge the earliest in the RDBMS. (And I’ve seen people try to address
this with enormously expensive computers and Oracle licenses, which only
gets you so far. A far better answer is not to use an RDBMS.)

Huh, first time I’ve heard that.

I’ve even seen the interpacket delay on switched ethernet links inside
In this case, it makes sense for you to design to the highest-traffic
case, and as I’m suggesting, you may hit barriers that you will have to
solve in very non-standard ways.

I realize it depends entirely on the application and usage patterns,
but at what point does “standard share-nothing” practice break down?
I assume it’s somewhere in between 1000 to 10000 requests per second?

if i can get that many hits per day, my wife and i would be working from
hawaii in a nice beach house with a lifted tundra truck parked outside.

On Mon, Jul 31, 2006, Francis C. wrote:

In my experiences on extremely high-load sites, I’ve seen the barriers
emerge the earliest in the RDBMS. (And I’ve seen people try to address
this with enormously expensive computers and Oracle licenses, which only
gets you so far. A far better answer is not to use an RDBMS.)

A less extreme solution to the database bottleneck is to cache heavily.
Assuming you have the resources for it (a bunch of RAM on a few
servers), memcached can be extremely effective at alleviating db
bottleneck.

dev(E)iate (the original client)
http://dev.robotcoop.com/Libraries/ (memcache-client and cached_model)

Robot Coop’s library is said to be faster than the original, but I
haven’t tested. cached_model makes it very easy to cache simple queries
(see documentation for details).

Stefan K. also discusses storing sessions in memcached for additional
speed boosts:
http://www.railsexpress.de/blog/articles/2006/01/24/using-memcached-for-ruby-on-rails-session-storage

In general, the railsexpress blog is an awesome resource for performance
tricks.

delay on switched ethernet links inside the server farm become the
bottleneck.

I’m not sure I buy this. The high-load sites I’m familiar with still
use apache (LiveJournal and slashdot are two good examples). While you
will get a large performance boost out of writing your own specialty
infrastructure, it’s not worth the time for the vast majority of sites
(even those doing more traffic than Joe is talking about).

The mantra of successful scaling is almost always “scale out not up”.
More machines and better load balancing is almost always better than
bigger machines running fancier software. Load balancing is an easier
problem to solve than reinventing the (HTTP) wheel.

Ben

Ben B. wrote:

I’m not sure I buy this. The high-load sites I’m familiar with still
use apache (LiveJournal and slashdot are two good examples). While you
will get a large performance boost out of writing your own specialty
infrastructure, it’s not worth the time for the vast majority of sites
(even those doing more traffic than Joe is talking about).

The mantra of successful scaling is almost always “scale out not up”.
More machines and better load balancing is almost always better than
bigger machines running fancier software. Load balancing is an easier
problem to solve than reinventing the (HTTP) wheel.

It sounds like you habitually build dynamic web sites that sustain
10,000 hits/second or more. I’ve only worked on a handful of them, and
they were all well-known sites with very large amounts of VC funding
behind them. The vast majority of sites should be built with commodity
tools and commodity hardware, they don’t make economic sense otherwise,
that’s obvious. But for really serious, high-value applications with
extremely large working sets, I can tell you firsthand that the
investment in custom software really can pay off. (And I’ve already
written low-drag, event-driven custom HTTP servers, multiple times- it’s
not as hard as some may think.)

I have no doubt that a site like slashdot can scale easily enough with
commodity software. Try something like DoubleClick or Google, though.

(But of course we’re offtopic since the OP has already clarified that
his scale requirements are not sustained.)

On Jul 31, 2006, at 4:36 AM, Rimantas L. wrote:

<…>

Is it impossible to imagine that someone has a good idea, has done
some research, is slightly over optimistic (but not necessarily
wrong!), and want to get an idea of what it might take to handle that
sort of load?

Getting Real

He didn’t say he was going to go out and scale, he said he wanted
to know what was required to do so.

It’s conceivable, for instance, that the economics of the project
require high loads before it’ll ever be profitable. Many people,
perhaps many of those who read the book, and perhaps those who
wrote it, might look down their noses at such projects.

However, with that attitude, we might not have light bulbs or the
electrical power to run them, as those were projects that both
required massive scale before they made any money at all.

And those guys put a lot of thought into making sure they could
scale long before it was required.

I read the book, and agree with the premises. But the book doesn’t
mean the OP:

  1. Was insane
  2. Has insane customers
  3. Should be embarrassed for asking


– Tom M.

On Mon, Jul 31, 2006, Francis C. wrote:

It sounds like you habitually build dynamic web sites that sustain
10,000 hits/second or more. I’ve only worked on a handful of them, and
they were all well-known sites with very large amounts of VC funding
behind them. The vast majority of sites should be built with commodity
tools and commodity hardware, they don’t make economic sense otherwise,
that’s obvious. But for really serious, high-value applications with
extremely large working sets, I can tell you firsthand that the
investment in custom software really can pay off. (And I’ve already
written low-drag, event-driven custom HTTP servers, multiple times- it’s
not as hard as some may think.)

I’m not sure why you think it sounds like that. I said “the sites I’m
familiar with” and gave two examples that are much smaller than
DoubleClick and Google. That said, they’re also much closer to the
theoretical site the OP was talking about than a site like Google.

I would suggest that if we really were talking about a site on the level
of Google (which we’re not), building a custom HTTP layer will only
defer the application-level bottleneck. Sooner or later it’s going to
come back to Rails.

Since we’re talking about a Rails app, in my opinion it only makes sense
to consider commodity (ie, outward) scaling. I have no doubt that what
you’re talking about is highly effective, and given the an appropriate
level of resources it would be the best solution… but how many sites
that are planning on maybe peaking at 10k hits/sec are in a position to
invest that much?

I have no doubt that a site like slashdot can scale easily enough with
commodity software. Try something like DoubleClick or Google, though.

Yep, and the former is what I was talking about.

Ben

Ben, I’ll take one more bite at this apple, and my apologies for
continuing the threadjack. You said this:

The high-load sites I’m familiar with still
use apache (LiveJournal and slashdot are two good examples).

which I took to mean that you are part of the engineering team for these
sites and others with similar loads, so you have firsthand knowledge. (I
don’t know how much traffic LiveJournal gets.)

The actual application profile determines to a great degree what kind of
scalability you will need, and there are interesting tradeoffs all over
the place. It’s not universally true, for example, that a site becomes
superlinearly more valuable with usage. (Put differently, Metcalfe’s Law
may not be true in general.) That makes it reasonable for people
approaching a site that may someday become really big to use the
commodity-software, outward-scaling approach, which in essence considers
development effort (including time-to-market) to be more expensive than
operational costs.

My real point (borne out by first-hand experience) is twofold:

First, with extremely large sites (and there are so few of them in
reality that each one is a special case, and there really are no
universally-applicable best-practices), you have to consider that
operational costs at a certain point really do outweigh development
costs. As an extreme example, Eric Schmidt has said that one of the
biggest costs Google faces is for electricity. Well-engineered custom
software can be the difference between economically possible and
not-possible.

Second, with some problems, outward scaling simply can’t be made to
work. One of your examples is slashdot. Think about what /. does and you
can see that there are multiple points where scalability can be enhanced
by partitioning working sets, introducing update latency, etc etc. But
look at something like an ad-serving network, which relies on a vast
working set that is in constant flux. You can’t just scale that up by
adding more machines and more switched network segments. Very early in
that process, your replication-traffic alone will swamp your internal
network. (I’ve seen that myself, which is why I mentioned Ethernet
interpacket delays as a critical factor.)

There are many places for Ruby in such an environment. I’m working on
one now that uses a lot of Ruby, but RoR was not an appropriate choice.
We’re using a custom HTTP server that includes a bit of Ruby code,
though.

sort of load?
No worries Tom, and you’re right none of that matches Google, eBay or
Amazon at inception. But neither do the numbers. Unless they’re
something illegitimate going on companies don’t suddenly get 10k hits
per second.

Live Journal is a good example. They started off with a fairly
standard webapp and their infrastructure (software and hardware) grew
along with the load. But none of these companies, AFAIK, called up a
contractor and said “OMFG 10k hits help!”.

I think you’re totally right about them being overly optimistic. And
unfortunately that puts Joe in a bad position that’s the result of
them not understanding how unlikely that kind of load is and them not
realizing just how dramatic the difference is between a normal webapp
infrastructure and one that can handle that kind of load. Hopefully
they have deep pockets and Joe will be able to actually deliver
something that can support that.

If you do Joe i’m sure we’d love to hear your solution.

As for real advice i have to agree with Francis. The largest site I’ve
worked on averaged about 3,000,000 page views per day (mostly during
business hours) and we were only able to maintain quick response times
due to having a custom built search indexing system. We had a
ridiculously large database of items and the system was so thoroughly
optimized that it took like a day and a half to generate an updated
index.

I scoffed at it when i was first hired. But I soon became a convert.
When you deal with loads like that you seriously need to consider
hiring some brilliant old school coder with a greying beard.

On Tue, Aug 01, 2006, Francis C. wrote:

The high-load sites I’m familiar with still
use apache (LiveJournal and slashdot are two good examples).

which I took to mean that you are part of the engineering team for these
sites and others with similar loads, so you have firsthand knowledge. (I
don’t know how much traffic LiveJournal gets.)

I’m sorry if I gave you that impression. I did work for LiveJournal,
and while I was in the room with the engineers and heard a lot of the
scaling discussion, I was not actively a part of engineering.

What I know of Slashdot comes from hearing discussions between the LJ
and Slashdot engineers about the topic, and from their talks at OSCON
and the like. Far from first-hand knowledge, for sure, but recall that
all I said was that they use apache, and it’s undeniable that Slashdot
is a high-traffic site :slight_smile:

The actual application profile determines to a great degree what kind of
scalability you will need, and there are interesting tradeoffs all over
the place. It’s not universally true, for example, that a site becomes
superlinearly more valuable with usage. (Put differently, Metcalfe’s Law
may not be true in general.) That makes it reasonable for people
approaching a site that may someday become really big to use the
commodity-software, outward-scaling approach, which in essence considers
development effort (including time-to-market) to be more expensive than
operational costs.

Of course. I should note that I don’t disagree with you at all. Maybe
I’m off by an order of magnitude, but a site which might burst to 10k
hits/sec just does not strike me as a Google-level site. Since that’s
what the OP was talking about, discussing the techniques of those sites
(while academically interesting) doesn’t seem to have bearing on the
conversation.

My real point (borne out by first-hand experience) is twofold:

First, with extremely large sites (and there are so few of them in
reality that each one is a special case, and there really are no
universally-applicable best-practices), you have to consider that
operational costs at a certain point really do outweigh development
costs. As an extreme example, Eric Schmidt has said that one of the
biggest costs Google faces is for electricity. Well-engineered custom
software can be the difference between economically possible and
not-possible.

True.

Second, with some problems, outward scaling simply can’t be made to
work. One of your examples is slashdot. Think about what /. does and you
can see that there are multiple points where scalability can be enhanced
by partitioning working sets, introducing update latency, etc etc. But
look at something like an ad-serving network, which relies on a vast
working set that is in constant flux. You can’t just scale that up by
adding more machines and more switched network segments. Very early in
that process, your replication-traffic alone will swamp your internal
network. (I’ve seen that myself, which is why I mentioned Ethernet
interpacket delays as a critical factor.)

Again, I agree completely. But (as you mention below), a CRUD-based
framework probably doesn’t make any sense in an ad-serving environment.
Frankly, that’s a much more specialized problem than the average
application, and so again seems tangential to the discussion.

Of course, this is all based on my assumption that the OP is building
the typical Rails site, since he didn’t say anything to the contrary. I
appreciate that very large sites can benefit from customized software
and clever infrastructure. But I also recognize (as was mentioned
previously!) that companies doing those kinds of sites are unlikely to
elect to use Rails unless it’s appropriate for their site.

In the context of a Rails application, outward scaling is going to be
more effective than upward scaling. That’s all I was ever saying :slight_smile:

There are many places for Ruby in such an environment. I’m working on
one now that uses a lot of Ruby, but RoR was not an appropriate choice.
We’re using a custom HTTP server that includes a bit of Ruby code,
though.

Very interesting. Does the server encapsulate the application logic as
well, or can it serve other applications? How is the Ruby stuff tied
in?

Ben

ps- I think this is close enough to the topic that it’s not really a
threadjack. People interested in very large sites will be reading this
thread, and a discussion of when and how you have to move past your
framework is interesting and valuable.

Ben B. wrote:

Very interesting. Does the server encapsulate the application logic as
well, or can it serve other applications? How is the Ruby stuff tied
in?
It’s a general server we’ve used on several applications now. The
hard-core guts of it are coded in C and we put a Ruby-extension wrapper
around it so it can be started as a Ruby program. That lets us add
functionality in Ruby for handling some of the requests. With roughly
equivalent dynamically-generated loads, it’s maybe twenty times faster
than RoR+Apache+Fastcgi on the same hardware. When we build plain old
CRUD sites with this technology, we generally use a component-based
framework rather than an action-based one- seems to make everything
easier.

ps- I think this is close enough to the topic that it’s not really a
threadjack. People interested in very large sites will be reading this
thread, and a discussion of when and how you have to move past your
framework is interesting and valuable.

It might be really interesting to ask what are the outer limits of
RoR+your typical clustered RDBMS engine. Any web site that sustains
1,000 dynamic requests per second is a major piece of engineering, and
probably not that rare a requirement either. (Being able to burst up to
10,000/sec isn’t really interesting without knowing how wide and how
frequent the bursts are.) It would be great to determine and publish
best-practices for such sites. Especially if they can be combined with
expected cost metrics. (For example, Java partisans might argue that the
3-5x development cost increment of using J2EE is well-compensated for by
requiring less hardware and infrastructure at runtime, but that might
not turn out to be true.)

And of course not all CRUD applications are created equal. Some are
read-many-write-few, so caching result sets gives a big win and permits
easy outward scaling. But some are read-many-write-many. Those are
harder.