Scaling Up vs. Scaling Out: Hidden Costs

This has been touched on by others, but the cost of maintaining the scaling out option, it’s more complex software issues, etc, are not trivial, and you will get more hardware failures, though each one should matter less due to the redundancies.

It’s probably a case by case kind of decision, but I would probably generally favour a compromise approach as well. Scale up until it starts to get nasty expensive, and then look into distributing the workload. You can do a surprising amount with one or just a few machines.

I work in hosting for a large hosting provider, so for once I have some insight to share. A lot has changed in the datacenter space over the last few years. But first the things that haven’t:

  • Separation of application and database duties is critical so that each can scale independently of the other - the databases scale up, and the applications scale out.
  • Typically databases scale vertically (what you refer to as “scaling up”) as they have a central data repository that must be kept in sync across multiple servers. So you tend to see either “big iron” or very high CPU systems act as database servers. Clustering - whether active or standby - is a must. Systems that do scale out like Oracle RAC are frightfully expensive and difficult to maintain.
  • Application servers should scale horizontally. Just throwing processing power at something is a good sign You’re Doing It Wrong. If an application properly keeps its state in a remote backend database (see Very Large Database Servers above) then you can add as many application servers as you need, using pizza boxes or whatever is most efficient.
  • VERY few applications will scale linearly as processors are added, as most only have so many threads and parallelism in place. Serial operations happen, and will reduce the potential of these monster “scale up” solutions.
  • For horizontal scaling you’ll likely need external load balancing, which is another piece of hardware, and another thing to power, cool, and manage.
  • Power (and cooling, which is tightly tied to the wattage you’re pulling) is critical, so you’re likely to want to find a middle ground between a 1U pizza box and a 7U monster like the HP. There are also other issues such as storage, expansion slots, etc.

Now as for what’s changed. In a nutshell, performance per watt.

  • It used to be you could just about count processors by rack unit. A 1U server was a single processor, 2U was dual, 4U was quad, etc. There were some minor efficiencies gained - 4U servers tended to become 3U over time - but not much. Until multi-core processors hit.
  • These days, you can pack quad-core processors into most servers. A 1U server can typically now reach 8 processing cores (2 x Quad-Core). So while that 32-CPU beast may be 7U, if you just need raw processing and memory, you’ll likely still come out ahead using small pizza boxes. Internal storage and expansion will be minimal, but if you’re using NAS or SAN it’s a non-issue, and you’ll get better storage management to boot. SAN avoids burdening your network and offers much higher performance, but at a price.
  • Licensing varies, but Microsoft in particular considers a single die 1 processor, regardless of the number of cores therein. This heavily favors multicore solutions and reduces the licensing penalty above.

As others have commented, your costs and numbers above only make sense if it actually took 83 of those servers to match up to the HP - something that I doubt would be the case for most well-written applications. If we’re judicious and say that 16 would match it (personally I’d blace bets you’d still be ahead) then those numbers look quite different.

So in an ideal world, you’d have a nice SAN backing up your servers’ data needs, a bunch of small servers for application (blade servers are actually quite nice in this niche), and a couple large clustered servers for the database, with a gigabit network for communication between them.

As everybody knows, open source is only free if your time is free.

What are your system admin costs for 83 servers? Not pretty. Probably cost more per year in salary and cost burdens than the big iron hardware. And don’t forget to factor in some kind of serious hardware for load balancing, and admin of that.

I assume you wouldn’t put sql server on every machine, right? The configuration for clustering those would be kinda crazy and probably not necessary.

Also, here’s a thought…how about doing a little analysis and actually architecting your server setup to handle your user load…THEN start comparing different solutions.

Joshua: nice summary, thanks. Rounded off the original post nicely.

Joshua: You sir are spot on thank you for typing that.

This is why I don’t do any ASP.NET work.

Interesting, but you can’t really scale SQL “out” that way without a whole lot of application redesign.

Other types of servers, though (like web servers) would scale out (relatively) simply.

I guess the problem is that you pay the same price for a windows/sql server software that runs on XXXX CPUs than the version of windows/sql that runs on XX CPUs. I remember back then before the multi-core age that you used to pay for your software on the base of how many CPUs it would run on. That kind of scare away small developers because they can’t afford their small server costing more on software than on hardware. MS should make a windows server license to run on most of 4 CPUs and be way cheaper.

Reminds me a lot of a calc 1 optimization problem, or trying to debate the merits of a 10x1 rectangle vs. a 1x10 rectangle.

Clearly, the correct answer lies somewhere in the middle, far from the two extremes you’ve presented.

I think the biggest tactical advantage for scaling out is that you can buy capacity a couple of grand at a time and install it without taking down the capacity that’s already running.

Other than that, the big iron starts to look pretty good. I think this is a central part of the appeal of VM-based server consolidation – to a certain extent, you can get the best of both worlds.

Great breakdown, though - thanks.

Something that others didn’t seem to mention, when scaling out you’re no longer running a single instance of Sql Server and Windows. With 83 separate servers you now have 83 instances of Windows and 83 instances of SQL Server gobbling up hard drive space and RAM. With 40 TB storage at your disposal drive space probably won’t be an issue but I would imagine that Sql Server would act a lot differently when it has access to 8 GB RAM as opposed to having access to 512 GB.

Ubuntu 9.04 64bit Server + Apache 2.2 + MySQL 5 + PHP + OpenJDK 1.6 + Red5 0.8 = $0

There are some things money can’t buy. These things are known as open source software :slight_smile:

With the right planning, software licensing really can cost $0.

“scaling out is only frictionless when you use open source software”

correction…
scaling out is only frictionless when you use low/no cost software.

It’s the per-server (or worse, per-cpu) licensing that frequently kills the non-opensource approach (which is not the sole difference of opensource software).

If you have a ‘site’ license for whatever product you are running, however, it may not be open source but may have a fixed cost, and you can still scale horizontally.

For a search engine (pre-google days) we migrated from a large 32cpu SGI machine to 70+ commodity servers allowing us to cancel the annual maintenance costs that were easily in the mid six figures (at the time). We tried to go horizontal with a J2EE engine, but found it licensed per server and found horizontal scaling limited by license costs alone.

@Miff

“This is why I don’t do any ASP.NET work”

Because the 13th most popular site in the US needs a big database server?

Thanks to Joshua Ochs – great “scaling-up vs scaling-out” overview!
Thanks to Jeff too for asking the question. :slight_smile:

I did notice Plenty of Fish had got a bit faster recently :wink:

Reminds me of the old “Grudge Match”: a rottweiler vs. a rottweiler’s weight in chihuahuas.
www.grudge-match.com/History/rott-chi.shtml

Scaling out implies several hidden “costs” such as synchronization overhead (replication and so on), balancing overhead and so on.

Scaling out can look nice but sometimes it will not be as nice as it looks.

The site I’m working at is shifting from a “scaled up” approach to a “scaled out” approach.

We currently have a few hundreds of thousands of users running off four or five SQL Server back-end boxes configured quite similarly to the big-end Proliant mentioned, plus further application servers and front-end boxes.

Our problem is threefold:

  • huge bottlenecks into the SQL databases
  • the application tier is monolithic and hard to update or scale
  • the SQL Server licence costs are outrageous

So the solution chosen has been to rearchitect, from the ground up, into a set of 20-30 small services.

At the bottom, the core services each have their own database. Above this, a coordinating layer performs business logic. At the top an MVC layer does the presentation.

All services are stateless, so scaling out should be very simple: just add another box where necessary in the middle tiers.

We are replacing the single fairly monolithic database tier with a series of small databases that should share the load between them, and moving the business logic out of stored procedures to the services. This means we can move over to MySQL.

The downside? 150+ developers for about a year!