My Scaling Hero

It’s amazing what you can do without that much hardware. And this is even truer for a free service that doesn’t need to be up 99.99…% of the time.

But even so, you have to give credit where credit is due. Markus really knows what he’s doing. To be able to handle that kind of traffic by yourself, well it’s legendary!

Hey Now Jeff,

StackOverflow will scale it’s a credit to ASP.NET.

Coding Horror Fan,

Catto

Alright! thanks for the link. Now to loggon and get laid! Yeeeeha!

Be careful when you say uses ASP.NET. Part of why he was able to run so much on one server was that he didn’t use web forms, server controls, etc. Raw string concat (this is according to an interview he gave on Channel9).

BTW, Markus was my answer to the Stack Overflow who is your programming hero.

@Mecki We made sure that all heavy work-load is performed by MySQL

I’d have thought that you reach a point where you can only scale a database so far (i.e. better hardware), whereas if there is more logic in the web-layer, you can keep adding more servers to the web-farm to handle that load.

If you’ve got one shared server available (sounds like you do?) the MySQL heavy load seems fair - just wondering how you’d approach it if you had more servers available, and where you’d place the heavy lifting then?

to be frank, trashier than those on OKCupid.

You say that like it’s a bad thing :wink:

@Paul Keeble, someguy:

Rails is not scalable - the database abstraction will kill a site sooner or later. Twitter is a case in point (ironically used more often by RoR advocates missing the point). It was scaled primarily with a massive amount of caching, which, of course, bypasses almost all of Rails’ clunky whirring grinding.

You shouldn’t be surprised that fish scales :wink:

Actually considering the hardware, I don’t see this as a scaling wonder. The hardest work is probably performed by the database servers and we have no information what giant performance beasts these are.

Handling 64000 simul. connections can be achieved with much worse hardware. Each of his servers has 8 cores (2 CPUs Quad Core) and he has two of them. That 16 cores, each 2.66 GHz, can handle 64000 is not really surprising. These are 8000 connections per core and that is doable with 2.66 GHz.

If I profile our web servers, I see that Apache is actually only 20% of the work-load (and that includes the PHP scripts running in Apache) and 80% of the time is taken by MySQL performing database operations. So if we offload the database to an external server, we have 80% CPU time spare. Ok, I must admit that we pay a lot of attention to make sure that MySQL does the actual work wherever possible. PHP doesn’t do much more than building the SQL statement, firing it to the database and put some HTML formating around the results.

We made sure that all heavy work-load is performed by MySQL (since if you can do something in PHP or directly in SQL, you can bet that SQL will do it faster than you ever could do it in PHP). And now that MySQL has Subselects, you can run even much more complicated requests within a single SQL statement instead of fetching data from the database in PHP and then build more SQL statements based on this data. Thus whenver our server slows down and check where the load comes from, usually always the database is the culprit.

Somehow we developers just can’t convince the management that we must stop running the backend database on the same servers as the web frontend… they don’t want to listen to us. We could probably once buy a set of really huge performance beasts to run the SQL backend and easily replace our web frontend servers with much weaker, more lightweight machines.

3 billion records, sounds like a script needs to be run through every so often to clear out old data, its probably message posts or something if it’s that large. Just archive it off, improve speed even more.

Not to ben an a**hole or anything, but I don’t find the design of SO that much better then that of plentyoffish. It’s a little better, but it better should because you guys hired a real designer for it, right?

When you compare PlentyofFish with MySpace, it makes you ask the question:

What the hell are MySpace doing will all those extra servers?

@Mecki: Ok, I must admit that we pay a lot of attention to make sure that MySQL does the actual work wherever possible. PHP doesn’t do much more than building the SQL statement, firing it to the database and put some HTML formating around the results.

Refreshing to know that a youngster (you are, yes?) gets what databases are good for. Not that MySql is spiffy for a transactional application.

2x database = 6x maintenance?
do you really need a database in that instance?
(might a filesystem suffice?)

Jeff, does your wife know that you’re spending so much time on PlentyOfFish?

ASP.NET controls are great for prototyping and rapid development. If you need to throw a site together and don’t want to start mucking around with HTML, javascript, and input checking you can’t do worse than what’s available. But it’s probably not a great idea to make that prototype into your main site without a lot of optimization. When I was getting started I never understood why some sites would use println after println. It made it difficult to update and maintain those pages. It became apparent why not too long after (though I still hated to maintain pages with hardcoded html).

Also, after updating so many customer databases it really becomes a nightmare to keep a database running. Sometimes it would be easier to just flush the entire thing and start fresh but they have years of information stored on those systems. You learn to tiptoe around the problems and learn the importance of transactions.

Scaling issues are always the fault of some programmer who made a tool or library that you use which is not intended to be scalable.

So, in reality it’s always your fault :slight_smile:

The problem with free is that every time you double the size of your database the cost of maintaining the site grows 6 fold. I really underestimated how much resources it would take, I have one database table now that exceeds 3 billion records. The bigger you get as a free site the less money you make per visit and the more it costs to service a visit.

I’m not sure what to make of this. I’ve never run an ad supported website so maybe I’m wrong but I thought that advertisers paid more money if their ads were exposed to more people. So higher traffic should mean more cash from advertisers right?

@sp160n: Hmmm, www.okcupid.com is about 10x better in every way, and is also completely free.

1.2 billion page views per month, 500,000 average unique logins per day

30+ million hits per day, 500-600 per second

45 million visitors per month

top 30 site in the US, top 10 in Canada, top 30 in the UK

It doesn’t matter. I think that’s the point of Jeff’s post.