Choosing Dual or Quad Core

I'm a big fan of dual-core systems. I think there's a clear and substantial benefit for all computer users when there are two CPUs waiting to service requests, instead of just one. If nothing else, it lets you gracefully terminate an application that has gone haywire, consuming all available CPU time. It's like having a backup CPU in reserve, waiting to jump in and assist as necessary. But for most software, you hit a point of diminishing returns very rapidly after two cores. In Quad-Core Desktops and Diminishing Returns, I questioned how effectively today's software can really use even four CPU cores, much less the inevitable eight and sixteen CPU cores we'll see a few years from now.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/09/choosing-dual-or-quad-core.html

Thanks for that! What about servers? web applications, database… Will quad cores systems add benefit there?

depends if you want a single application to go faster or you have several apps you want to go faster.

Say… Running Several instances of Visual Studio and a VMWare… etc etc

Stuff like 3D rendering, or compositing applications, or pretty much anything dealing with processing images, can very easily be split into regions, for rendering by separate cores.
Modo (3d app) is the most obvious example of this.
When you render something with two cores, you see two little blue boxes processing a segment of the image. If you have 4, you see four little boxes. If you have two quad-core machines doing network-rendering, you see four blue boxes (local cores rendering), and four orange boxes (the remote box rendering)

Even me, who isn’t the best coder in the world, could work out how to write a render to take advantage of multiple cores.
Where as with games, I can’t think of anything that could utilize the spare CPU cores…
I wonder if it’s even remotely possible, but: To use extra cores as “software-graphics-cards”. Since graphics are the only thing that really needs lots more processing power in games, it’d make sense to say divide the screen up between them, and use the remaining two cores to process extra effects on their area on screen. Biggest problem being the CPU’s aren’t as fast as drawing stuff to the screen as graphics cards are…

But, yeh… For gaming, dual (or even single) core processors are more than enough. CPU’s are generally not the bottle-neck for games.
Buut… For 3D/compositing workstations, a quad-core CPU (or dual-CPU quad-core) does substantially speed up rendering.

Another thought, to add to this slightly rambling comment:
MP3 encoding. Instead of speeding up a single-MP3 encoding, why not have the application process 4 different files at once. It’d me much simpler to code (Since you don’t need to worry about parralizing(?) the encoding process, you just basically need to fork the encoding once for each core…
Since encoding the same MP3 over 4 cores probably wouldn’t speed it up that much (The code would spend more time starting the next file than actually processing bits), completing four files at a time would complete the task faster.

Please don’t use red and green text that are otherwise identical (same saturation, value, font, and so on), to differentiate positive and negative results. Yes, there is a minus sign in front of the negative results, but this is slow for the brain to latch on to, especially since the font is rather thin.

There are many other ways to visually separate good and bad results, almost all of which are better than just red and green and no other differentiation. I’ve seen some beautiful and effective choices, though many tend to bias the reader (bolding the bad results, for instance). Personally I find just replacing the green with blue to be quite effective.

This is the sort of comparison you see all the time, and it may be an incredibly stupid question, but instead of seeing how one application does across multiple CPU cores, I’d like to know how the Operating System goes distributing several applications across cores.

Or is that not how it works?

Because if I can get four apps working at higher performance, sometimes that’s a better scenario.

Of course there is also the point to be made of how many apps have ever been written to take advantage of multiple cores yet?

A lot of the ones I work with there just isn’t the need.

Actually with all the background processing and everything I’d love to see how some of the WPF apps coming out are going to go.

The issue is certainly the software. I think that pretty much everything is parallelizable. The issue is that they aren’t within our current programming paradigms. I think the question you should be asking is why we really need more powerful computers. The answer, I believe coincides much with the list of things multiple cores are good for!

You say “Unfortunately, CPU parallelism is inevitable.”. Unfortunately? I think not! This is the opening for revolution in computer architecture and programming languages.

dbr - Games can actually be very easily parallelizable, and could easily take advantage of all the power offered by quad, 8, 16 cores, etc. The issue is that the game engine would have to be written with parallelization in mind (looks live Valve is doing this), and the benefit isn’t huge when not too many have dual/quads. It wouldn’t be unreasonable to devote one core to managing the graphics card(s), push load content, etc. One or two cores could do physics stuff, in the absence of a PPU. I’d love to see a game that actually benefits from an entire core devoted to AI and game play. It is nearly impossible to design a game such that it smoothly scales from single core to several cores, though. It changes the game too much.

@dbr:

There is actually quite a bit that can be sped up in games, outside of the pure rendering aspect. For instance, depending on the algorithms used, AI can often be separated into global and individual “thinking” – the latter can be distributed across cores. Even with a purely global AI design, simply moving the entire AI subsystem to a separate core may work well.

Then of course there’s the sound subsystem, which can decently chew CPU when a great many environmental sound effect tracks are mixed by a 3D audio engine. Again, that can be thrown in its own thread.

And then there is physics. Some of it can be parallelized, and other pieces can’t – but certainly physics can be overlapped with rendering. Because the non-destructible portions of the environment will be unaffected by the results of the physics calculations, those can be rendered while physics is still being run for a given frame. Also, once the physics is completed for a given frame, the results can be passed off to the renderer while the physics computation begins on the next frame.

Weather and other complex but slowly-changing environmental effects can be computed in separate threads that post results asynchronously to the main engine. Networking/world synchronization can run in a thread of its own.

And the list goes on … Mind you, such a heavily threaded design is not necessarily easy, but it certainly is worth the work, as Valve has been quick to point out.

Yep. Main benefit of multi-core on the desktop is avoiding excessive context switching. As you say, diminishing returns above two unless the software has been explicitly parallelized.

What about servers?

Servers are totally different scenarios. There are plenty of users who believe their desktop usage scenarios are similar to servers, but it’s utter wishful thinking on their part…

if you want a single application to go faster or you have several apps you want to go faster.

Within reason, yes, but dual-core gets you 99% of the benefit of (n) core. If you’re not careful, this becomes the wishful thinking scenario I just described. No matter how much of an ultra-elite-ninja single user you are, I guarantee you’re not generating anything close to the kind of load that a server would experience under even the mildest of loads. Desktops aren’t servers.

Where as with games, I can’t think of anything that could utilize the spare CPU cores…

http://news.zdnet.com/2100-9584_22-6119913.html

One such company is Remedy, which demonstrated a game called “Alan Wake” at the Intel show.

The game is designed to farm tasks to different processor cores, said Markus Maki, director of development, in an interview. There are three major program threads and each can occupy a core of its own: one for the main game action, one for simulating physics of game objects and one for preparing terrain information that’s later sent to the graphics chip for rendering. A fourth core can handle other threads, including playing sound and retrieving data from a DVD, Maki said

I have yet to see a single game that shows anything close to the kind of scaling that we regularly see with rendering or encoding.

Approaches like this sound good on paper, but developers are seriously hobbled by the existing market of single and dual core CPUs. They have to write AI that can scale between an entire core on a quad-core machine, 1/2 of a core on a dual, or 15% of CPU time on a single.

Why look at today’s programs performance with tomorrow’s cpu setups? Surely after time programs will be written to take advantage of multiple cores. Remember there was a time when “no user of a pc” would need more than 637k of RAM :wink:

Where’s the 2 to 4 core comparison for Visual Studio and other compilers?

If you can find these kinds of benchmarks, then godspeed. They’re rare. The very first link in this post contains one compilation benchmark, but it’s dual-core:

http://www.codinghorror.com/blog/archives/000285.html

This review shows no scaling improvement for quad-core in Visual Studio 2005 compilation:

http://xtreview.com/review212.htm

The gcc compiler does support multiple cores and seems to scale fairly well:

http://www.phoronix.com/scan.php?page=articleitem=585num=4

Cheat sheet for the last graph: E5320 is quad 1.86 Ghz; E5150 is dual 2.66 Ghz.

single E5150 – 12.06 sec
single E5320 – 11.08 sec

http://techreport.com/articles.x/11237

I think there’s a better article though, I don’t have time to find it now. One of the interviews with Valve about multi-core support explains some of the benefits and difficulties with programming for multi-core (or more adapting existing code for multi-core).

Con: Amdahl’s Law
http://en.wikipedia.org/wiki/Amdahl%27s_law

Pro: Reevaluating Amdahl’s Law
http://www.scl.ameslab.gov/Publications/Gus/AmdahlsLaw/Amdahls.html

The Xbit Labs review can’t have activated the “threads=x” option for xvid. xvid encoding on a quad core Mac Pro either from command line or from Handbrake maxes out all four cores, and hits about 95fps encoding rate (with all the quality options on).

But in the meantime, clock speed wins most of the time. More CPU cores isn’t automatically better.

More CPU cores still allow you to run more applications with less contention for CPU resources (you may get starved for memory bandwidth though).

In this day and age of Firefox and other IntelliJ/Eclipse/Visual Studio (while I do love them you can’t consider them lightweight on either memory or resources), having more CPU cores allows your computer to still be responsive even though you’re running Firefox and your IDE and some expensive compilation and even more without having to rely on nicing processes.

.NET compilation gets some multi-core love with the 3.5 Framework[1]. I’ve been using this for a while on my home projects. It helps a bit, but not a ton. If you have a lot of projects and a clean dependency graph, it can shave a decent amount of time off the total build, but it varies a lot.

  1. http://blogs.msdn.com/msbuild/archive/2007/04/26/building-projects-in-parallel.aspx

Now, even there the drop-off is significant after /m:2. On my Q6600@3.3Ghz, running with 4 build nodes (/m:4) is rarely any faster than running with 2 (/m:2). Here are some fresh timings for a clean build on a small-to-medium size project:

/m:1 - 4.39s, 4.24s, 4.71s (4.45s avg)
/m:2 - 3.58s, 3.65s, 3.60s (3.61s avg)
/m:3 - 3.86s, 3.52s, 3.74s (3.70s avg)
/m:4 - 3.19s, 3.75s, 3.86s (3.60s avg)

This is around 2.5 MB of source code spread out over 16 projects.

Even so, I’m pleased with my Q6600. It wasn’t very much more expensive than the dual core, and usually there are quite a few things going on besides compilation to take advantage of the extra power.

I’m very surprised that the Erlang fan boys haven’t jumped in here yet.

They don’t care, their software makes use of multiple cores just fine already.

I wonder, what about Stackless Python?..

Re. Visual Studio, isn’t it a breach of the license terms to publish performance information? It certainly is with SQL Server! This would explain why there isn’t any data out there.

One area where quad is definitely better is for optimization. With dual core, it’s difficult to optimize for quad, but with quad you can optimize for quad, dual, uni-core systems.

Anytime you’re optimizing parallel code for a wide audience, you want more cores. And with quads about to become the baseline (and already very cheap), choosing dual cores is probably no longer the right choice.