Real World Scalability and Ruby - Top 20

Folks,

After the long post regarding Joe’s now infamous entry about Ruby, I
wondered, what have the really successful, scalable, big, websites /
web applications out there have used.

So I turned mostly to two sources, which while not perfect, are good
enough.

For popularity or raw scalability I used Alexa’s ranking here:
http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none

And for webserver I either used Netcraft:

or actually tried to figure out what the site used by looking at the
code or what the page used.

This is the list of the top 20 sites and what they are apparently
using, these days many of these sites hide what goes on inside, but
still we can guess, and yes I have given it my best guess in some
cases.

Here is the table, feel free to add more information to it since there
are still many gaps that I have identified with a question mark. The
first line mentions the site name and web server, the second line the
language or framework behind it.

1 Yahoo FreeBSD
PERL, PHP, Proprietary, C?

2 MSN Windows Server 2000/2003, Some Apache
ASP, ASP.NET, DLLs?

3 Google Linux based or unknown servers, probably modified FreeBSD or
Apache.
Python, Perl, PHP?, C, Proprietary, Java

4 Baidu.com Linux based unknown.
?

  1. Qq.com Linux based unknown and Windows 2003.
    ?

  2. MySpace Windows 2003 / 2000 some Linux unknowns too.
    Coldfusion

  3. sina.com.cn FreeBSD, Solaris 8, Linux based unknowns,
    ?

  4. Yahoo Japan Like Yahoo at 1.

  5. 163.com China FreeBSD, Linux based unknowns,
    ?

10 Live.com Windows 2003, Linux unknown servers
ASP, ASP.NET, DLLs?

11 eBay.com Windows 2000/2003
PERL?, C?, DLLs, Proprietary, more?

  1. Sohu.com China Linux unknown servers
    ?

  2. YouTube.com Linux unknown servers
    ?

  3. Yahoo China Like 1

  4. Microsoft Windows 2003 / 2000
    ASP.net, ASP, DLLs?

  5. Wikipedia FreeBSD, Linux unknown servers
    PHP, PERL?

  6. Amazon.com FreeBSD, Linux unknown servers, Solaris 8, Netware
    PERL, Proprietary, more?

  7. Orkut.com Linux unknown server
    PHP?, PERL?

  8. Blogger FreeBSD, Linux unknown servers
    PHP?, PERL?

  9. Google UK Like Google

INTERESTING FACTS

  • Not a single significant “safe” Java J2EE in the top 20.
  • Many proprietary variations with FreeBSD or Linux as the only common
    ground
  • Some .NET and Windows 2003 are indeed listed
  • Arguably the biggest web application is MySpace which is based on
    Coldfusion! Certainly not a “safe” choice by a long shot.

CONCLUSIONS

  • Choosing one Framework or language over another seems to be mostly
    irrelevant as long as you stick to the underlying technology: FreeBSD,
    Linux Based server or Windows 2003 which appear consistently in the top
    web sites again and again. Also although is not mentioned anywhere,
    Oracle, MS SQL Server and MySQL are high up there in these rankings
    too.
  • Java and J2EE is by far absent from this list, this should tell us
    all something.
  • .Net is very present on the list, MS obviously is doing something
    right. The progress Ruby is doing with Windows is encouraging.
  • Choosing the best tools for the job can give you a big payout.
    MySpace is Coldfusion based, this is risky, but gives you the ability
    to write database web applications fast… and it has worked well. I
    would say the risk was worth it.
  • Not a single major Ruby or Ruby on Rails app cuts to this list yet,
    but I see no reason why this would not happen eventually.

Food for thought fellows, any input on how would Ruby would ever get
there, or who runs what, would be appreciated.

Jose L. Hurtado
Web D.
Toronto, Canada

On 9/9/06, Joseph [email protected] wrote:

Folks,

After the long post regarding Joe’s now infamous entry about Ruby, I
wondered, what have the really successful, scalable, big, websites /
web applications out there have used.

It seems like you’re just guessing at these. I don’t know all of them,
but I know for certain that eBay is running a crapload of Java in
their backend. In fact, they used to be a solid ASP site (pre-.NET)
but switched to Java because the ASP stuff scaled horribly. IBM made a
big media event out of that a few years back. I mean really, the site
has Sun/Java branding right at the top…so it’s probably safe to
assume Java’s involved.

It’s also extremely subjective to “look at what the site uses” because
I know for a fact many large Java-based sites use URLs showing
something other than “.jsp”, for reasons that are perhaps obvious. And
there’s another large chunk of sites that use PHP or .NET or what have
you for web-facing stuff while the vast majority of their apps are
actually backed by large Java clusters behind some variety of service
layer.

It would probably be better to leave the guesses off the list
completely and not try to draw any conclusions at all. Unless you
really know what these sites are using (a difficult prospect at best)
no conclusions are possible.

On Sun, Sep 10, 2006 at 12:30:09PM +0900, Joseph wrote:

Here is the table, feel free to add more information to it since there
are still many gaps that I have identified with a question mark. The
first line mentions the site name and web server, the second line the
language or framework behind it.

1 Yahoo FreeBSD
PERL, PHP, Proprietary, C?

Also Python and Common Lisp (though the Lisp codebase is not growing at
this point – it’s “legacy code” that is indispensable as long as they
keep their RTML templating system).

2 MSN Windows Server 2000/2003, Some Apache
ASP, ASP.NET, DLLs?

I believe they’re still using some FreeBSD systems at Hotmail, and all
of Windows is behind free unix firewalls through a proxy service.

  1. MySpace Windows 2003 / 2000 some Linux unknowns too.
    Coldfusion

Migrating to BlueDragon.NET, which uses .NET as the back end for
ColdFusion, last I checked.

ASP, ASP.NET, DLLs?
14. Yahoo China Like 1

  1. Microsoft Windows 2003 / 2000
    ASP.net, ASP, DLLs?

  2. Wikipedia FreeBSD, Linux unknown servers
    PHP, PERL?

The Wikimedia Foundation (Wikipedia, Wikinews, et cetera) has to my
knowledge only ever had a grand total of one FreeBSD server, and it
wasn’t really used in production. The servers are primarily running on
Fedora Core 3-5, with a couple of old Red Hat Linux and pre-Novell SuSE
Linux servers (unless those have been upgraded since I stopped working
there). The MediaWiki software is all PHP. MySQL is used for
databases. Thus, it’s classic LAMP platform. There are some Perl and
Python scripts running about for various administrative tasks, but they
don’t represent any kind of measurable percentage of traffic load.

There are a lot of squid proxies used for caching to serve pages faster.
There’s some rudimentary load balancing (last I checked) that’s handled
at least in part by in-house scripting.

The Wikimedia Foundation uses zero .NET or Java, in case you were
wondering.

  1. Google UK Like Google

INTERESTING FACTS

  • Not a single significant “safe” Java J2EE in the top 20.
  • Many proprietary variations with FreeBSD or Linux as the only common
    ground
  • Some .NET and Windows 2003 are indeed listed
  • Arguably the biggest web application is MySpace which is based on
    Coldfusion! Certainly not a “safe” choice by a long shot.

While MySpace is (again, based on what I’ve last heard) migrating to a
.NET foundation for its ColdFusion, it got to its current prominence
entirely on a ColdFusion 5 back-end, as far as I’m aware. Having never
been employed by MySpace, I of course cannot be as sure of this as I am
about Wikimedia Foundation information.

Food for thought fellows, any input on how would Ruby would ever get
there, or who runs what, would be appreciated.

It’s also worth noting that Slashdot is Perl on Linux, I think via
Apache and MySQL (but don’t quote me on that unless I’m right).

Charles,

Good to know about eBay, as I said I am guessing, and getting input was
precisely what I wanted… so it seems at least one Java is out there,
are there more?

Thanks Charles!

Jose L. Hurtado
Web D.
Toronto, Canada

On Sun, Sep 10, 2006 at 12:54:56PM +0900, Chad P. wrote:

I believe they’re still using some FreeBSD systems at Hotmail, and all
of Windows is behind free unix firewalls through a proxy service.

Arrrgh, typo. That should read “all of Microsoft”. Sorry.

Everybody talks about this “real” world as if it is different from the
one we all experience day to day.

I hope to one day experience this world, because I’m sure I’ll be
completely ready to rule them all with my vast knowledge of what works
in the “real” world.

Here’s to talking heads…

P.S. Scalability. And enterprise, don’t forget enterprise. My latest
benchmarks show that under certain conditions, some numbers are
produced.

Jose Hurtado wrote:

Folks,

Many people seem to forget that the word ‘scalability’
implies bidirectionality. I assert that Ruby scales
better than Java for most things:

      Small----------------------------------Large

Java <-------->
Ruby <------------------------------->

Have a nice day.

On Sun, Sep 10, 2006 at 07:06:36PM +0900, A. S. Bradbury wrote:

On Sunday 10 September 2006 04:54, Chad P. wrote:

The Wikimedia Foundation uses zero .NET or Java, in case you were
wondering.

This is really getting off topic, but Wikimedia do use Lucene for at least the
english search (compiled with GCJ however).

Hmm. I’d forgotten about that.

It’s kinda like splitting hairs, though – which is why I didn’t
remember it was technically written in Java.

On Sunday 10 September 2006 04:54, Chad P. wrote:

The Wikimedia Foundation uses zero .NET or Java, in case you were
wondering.

This is really getting off topic, but Wikimedia do use Lucene for at
least the
english search (compiled with GCJ however).

Alex

Joseph wrote:

[snip pulling arguments out of your pinky finger]

You forget intranets. Internal company webapps have to serve humongous
amounts of traffic on not really lavish hardware. Listing the Fortune 20
of websites which indeed CAN afford to “just throw more servers at it”
tells us precisely nothing at all about technology scalability.

Windows probably sees more use for company backends than you can imagine
on accounts of being easy to set up and work with up to a certain scale
when you really need automation instead of an underpaid student support
gimp.

Also, your method of research is laughable.

  • Choosing one Framework or language over another seems to be mostly
    irrelevant as long as you stick to the underlying technology: FreeBSD,
    Linux Based server or Windows 2003 which appear consistently in the top
    web sites again and again.

Oh yes. Only the three most mainstream server OSs appear in that list.
Surprise.

  • Java and J2EE is by far absent from this list, this should tell us
    all something.

Or not, since the list is worthless data.

  • .Net is very present on the list, MS obviously is doing something
    right. The progress Ruby is doing with Windows is encouraging.

Good marketing, IIS comes with Windows, more straightforward than Java
to do MVC and deployment with. Makes it an easier first choice nowadays.

  • Choosing the best tools for the job can give you a big payout.
    MySpace is Coldfusion based, this is risky, but gives you the ability
    to write database web applications fast… and it has worked well. I
    would say the risk was worth it.

Worstofmyspace.com begs to differ. I completely ignore the very
existence of MySpace except from tidbits on the aforementioned site, but
if it’s remotely to be trusted, it’s far from stable and reliable.

David V.

David V. said:

You forget intranets. Internal company webapps have to serve humongous
amounts of traffic on not really lavish hardware. Listing the Fortune 20
of websites which indeed CAN afford to “just throw more servers at it”
tells us precisely nothing at all about technology scalability.

Aha… I would argue this is not true. Having worked for two large
corporations in the past with over 200,000 employees and huge
Intranets, I can assure you some of the very worst delays, and
website/web application design is behind closed doors of those
Intranets. And I have yet to find one internal webapp that reaches the
scalability of a public app… but then again, some government intranet
apps might truly be huge and similar to a major website traffic.

David said

Also, your method of research is laughable.

OK… this is a very aggressive way of making a point isn’t it David?
I am tempted to reply, but I will ignore the attack and suggest you
give us a better research method in under 1 hour that works and does
not make me us laugh, and then please do email us the results ; )

Now, don’t come back and tell us you need a few thousand dollars, a
month and a research firm to find out anything about this subject, when
something can already be known in under one hour, and that was my
point, to try to shed some light into this subject, and to receive more
information from others who may probably know more than I do. I
believe the open source movement calls this collaboration, and it
works.

Best Regards,

Jose Hurtado
Web D.
Toronto, Canada

On Sep 9, 2006, at 8:30 PM, Joseph wrote:

  1. Orkut.com Linux unknown server
    PHP?, PERL?

ASP.NET. Jeepers. Drop the guesses. -Tim

In article [email protected],
“Joseph” [email protected] wrote:

INTERESTING FACTS

  • Not a single significant “safe” Java J2EE in the top 20.

That’s the top 20 web sites that users directly visit. It doesn’t
really give you visibility into what is running on the back end for the
sites that are included. Amazon, for example, uses a lot of J2EE.

Worse, it misses sites like the iTunes Music Store completely, which is
a huge enterprise application. iTMS is a Webobjects application, so is
running on J2EE.

Friends,

As Tim B. suggested I’ve made my best to drop the guesses on the
list, and show only information I know is either true or reported by
some credible source. When no information is there, I just left a
question mark.

I have also updated the list with the information Chad P., Charles
Nutter and Tim B. added to it. This is the list so far, again open
for improvement:

1 Yahoo FreeBSD
PERL, PHP, Proprietary
“Also Python and Common Lisp” Chad P.

2 MSN Windows Server 2000/2003, Some FreeBSD
ASP, ASP.NET
“I believe they’re still using some FreeBSD systems at Hotmail, and all
of Windows is behind free unix firewalls through a proxy service.” Chad
Perrin

3 Google. Linux based or unknown servers
Python, C, Proprietary, Java

4 Baidu.com Linux based unknown.
?

  1. Qq.com Linux based unknown and Windows 2003.
    ?

  2. MySpace Windows 2003 / 2000 some Linux unknowns too.
    Coldfusion
    “Migrating to BlueDragon.NET, which uses .NET as the back end for
    ColdFusion… currently… on a ColdFusion 5 back-end” Chad P.

  3. sina.com.cn FreeBSD, Solaris 8, Linux based unknowns,
    ?

  4. Yahoo Japan Like Yahoo at 1.

  5. 163.com China FreeBSD and some Linux based unknowns,
    ?

10 Live.com Windows 2003, Linux unknown servers
ASP.NET

11 eBay.com Windows 2000/2003
PERL, Proprietary, Java J2EEE

"eBay is running a crapload of Java… they used to be a solid ASP site
(pre-.NET) but switched to Java because the ASP stuff scaled
horribly…the site has Sun/Java branding…it’s probably safe to
assume Java’s involved. " Charles Nutter

  1. Sohu.com China Linux unknown servers
    ?

  2. YouTube.com Linux unknown servers
    ?

  3. Yahoo China     Like 1
    
  4. Microsoft Windows 2003 / 2000, some FreeBSD at Hotmail, and UNIX
    based firewalls.
    ASP.net, ASP

  5. Wikipedia Apache, very little FreeBSD
    Mostly PHP, some minor PERL, Python and some Java for the
    English search.

“a grand total of one FreeBSD server… The servers are primarily
running on Fedora Core 3-5…The MediaWiki software is all PHP. MySQL
…it’s classic LAMP platform.” Chad P.

“but Wikimedia do use Lucene [Apache Java based text search engine]
for at least the english search” A. S. Bradbury

  1. Amazon.com FreeBSD, Linux unknown servers, Solaris 8, Netware
    PERL, Proprietary, more?

  2. Orkut.com Linux unknown server
    ASP.NET” Tim B.

  3. Blogger FreeBSD, Linux unknown servers
    ?

  4. Google UK       Like Google
    

Bye again,

Jose Hurtado
Web D.
Toronto, Canada

On 9/10/06, Tim S. [email protected] wrote:

Worse, it misses sites like the iTunes Music Store completely, which is
a huge enterprise application. iTMS is a Webobjects application, so is
running on J2EE.

You mean Objective-C for sure

–Tim S.


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein

I missed Amazon on the list I just sent, it should read J2EE there too
as was mentioned:

"Amazon, for example, uses a lot of J2EE. " Tim S…

Regards,

Jose Hurtado

On Mon, Sep 11, 2006 at 05:08:06AM +0900, Robert D. wrote:

On 9/10/06, Tim S. [email protected] wrote:
[snip the actual conversation]

Worse, it misses sites like the iTunes Music Store completely, which is
a huge enterprise application. iTMS is a Webobjects application, so is
running on J2EE.

You mean Objective-C for sure

Nope, WebObjects was long ago converted to Java.
Check out the site: http://www.apple.com/webobjects/

On 9/10/06, Logan C. [email protected] wrote:

You mean Objective-C for sure

Nope, WebObjects was long ago converted to Java.
Check out the site: http://www.apple.com/webobjects/

I had the very faint hope that a free version of Webobjects was used,
these are still in Objective-C, sorry if I wasted your time, I tried to
pay
you back with the URIs for the two Objective-C versions, but just cannot
get
my hands on my last Linux-Mag France.
One obviously is GNU-Step, but the other…, sorry :frowning:

Cheers
Robert


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein

On Mon, Sep 11, 2006 at 06:15:11AM +0900, Robert D. wrote:

One obviously is GNU-Step, but the other…, sorry :frowning:

GNUstep would be excellent if it wasn’t buggy. Darnit.

In article [email protected],
“Joseph” [email protected] wrote:

I missed Amazon on the list I just sent, it should read J2EE there too
as was mentioned:

"Amazon, for example, uses a lot of J2EE. " Tim S…

I got that from one of Steve Yegge’s old blogs, and also someone who
left where I work to go to Amazon was hired to do Java coding there.

I can’t find a cite for Yegge’s blog entry, because his old site seems
to be down, and Google’s cache of it isn’t working either. :frowning: