I've been following Michael Abrash for more than 10 years now; he's one of my programming heroes. So I was fascinated to discover that Mr. Abrash wrote an article extolling the virtures of Intel's upcoming Larrabee. What's Larrabee? It's a weird little unreleased beast that sits somewhere in the vague no man's land between CPU and GPU:
This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/04/i-happen-to-like-heroic-coding.html
The point is that lower-end systems have either low performance or low quality GPU’s, and drivers of similar aptitude. On the one hand you have the daunting task of supporting N crappy GPU’s with M driver revisions and P system configurations. On the other hand, you can write detection code to figure out when you’re not on a nice Core 2 + NVIDIA GeForce / ATI Radeon, and default to Pixomatic. For something like World of Warcraft, I bet this would work nicely and save them a lot of money otherwise spent on getting their customer support staff drunk because they’re unhappy all the time. Even the Atom has more than 1 hardware thread, and thus would get gains from Pixomatic 3’s multithread optimizations.
It’s been almost 6 months you wrote about your current video card:
Isn’t it time you fed your video card addition and buy a new one ? Can I have this one then ?
(The third comment in that article is yours, and you tell an Argentinian guy that you usually sell old video cards on eBay. I’m Brazilian, sell this one to me !)
Personally I think Larrabee and multi-core software rendering is a pretty good idea in the light of GHZ race coming to a halt.
It’s almost certain that Intel is working on 8+ core chips, with each core having Hyperthreading.
It’s all about scaling and simplicity. One platform, one architecture that handles all your computations is a great plus for consumers, vendors and developers. As workloads, including graphics, get more parallelizable it makes a lot of sense to stop putting money in developing specialized platforms, and concentrate all efforts on improving the CPU platform.
In the short term, you can’t go wrong with today’s graphics chipsets, but in 10 years time they will be obsolete.
HEROIC CODING IS WARRIOR CODING
You should also check out Tim Sweeney’s talk about the Twilight of the GPU. Where he talks about the Larrabee.
Just a thought … if it were not for the use of assembly language, maybe Pixomatic could be ported to a customized V8 rendering engine, in NewAge HeroicCoding?
I don’t know the subject well enough, just that Google Chrome is getting rave reviews for speed of rendering and processing.
You are all missing another market for Larrabee: Embedded applications where a lot of high intensive computing is still being done with custom chips or FPGA’s. Larrabee will have the horsepower to take over these tasks and make these systems more fully programmable.
webbonehead, are you invoking Atwood’s Law?
on my machine, this took 1.98 seconds in Chrome to render the Original JS RayTracer scene at 320x200.
(make SURE you turn off the display image while rendering option!)
@Niniane and @Rick
I hear you, but I see zero actual shipping games that have 3D software rendering options. Microsoft’s software renderer was a reference implementation meaning it was meant for accuracy, not speed.
Can you point to games other than UT2004 that have a software rendering option, or that ship with Pixomatic?
For one thing, it takes a BLAZINGLY fast cutting edge CPU to do well, and guess what machines with crap GPUs tend to not have?
I’m not sure if this changed in Windows 7, but I do believe that the Abrash and Sartain code represents best possible performance. I don’t think you can do better, and it’s still… not great.
We’ll see if Larrabee and future CPUs change that or not.
The Sims games (and I think Spore) use Pixomatic, so did some version of Flight Simulator. Big enough titles for you?
Many machines with crap GPU’s actually have surprisingly powerful CPUs, certainly fast enough to run a mid-level title at a low resolution. Just look at the figures Abrash quotes in his article. You don’t need a blazing fast CPU, not everybody is making or playing games that have UT-2004 level graphics.
As I pointed out it’s not always a case of crap GPUs, the default drivers that ship with hardware are notorious for being buggy and many (probably the majority) of PC users never update them.
As I also mentioned the software rasterizer in Windows 7 is NOT the refrast (Reference Rasterizer) that’s been present since DX6 and was a) incredibly slow and b) needed turned on via a registry key.
The question is whether it will be more effective to emulate a GPU in x86 or emulate x86 in a GPU. It looks like AMD are betting massively on the GPU and Nvidia have made their intentions clear with CUDA.
Intel are between a rock and hard place here (even though they look like the king of the world right now!). Every time they bump the number of cores up they push the whole software industry towards parallelizing their code. And that brings their code more and more within the grasp of the GPU.
I wonder if Apple have chosen the losing processor design again…
Are you tired because of bikeshed effect comments of before contiguous three posts? This post is not interest to common.
It’s only a matter of time in my eyes that NVidia releases a motherboard chip set that runs all it’s general purpose processing through the GFX Card(s) you have installed. No CPU required.
Although Larabee is the x86 instruction set, it does have specific extra instructions for graphics processing. I think what intel have done is make the things that are slow in x86 hardcoded into the cores. Remember they also have an overall core to handle frame output and I would guess anti aliasing.
Also we are looking at an initial version having maybe 16 cores each running 4 threads. Since GPU processing can generally be run in parallel this is going to be very different from running pixomatic on a current CPU.
I’m betting on Intel, lets face it they know what they are doing.
Hopefully some or all CPU operations can be passed off to the GPU cores without special coding like you have to with CUDA. Seems possible since it’s x86.
No, not at all pointless. I’d think that anyone who is looking to implement OpenCL in a processor-agnostic fashion would want to know about all this, and in detail. There’s an unending back-and-forth battle between powerful general-purpose CPUs and powerful specialized (and therefore multi-vendor and driver-dependent) processors-- and Pixomatic sets a benchmark for what’s actually possible.
This LRB thing never got me excited.
As said, is somewhere between good-enough and not-so-bad.
We know it isn’t going to be fast, but thanks to the more generic approach it won’t be nearly as slow when going bad.
However, is it good?
If something is good at task X and something else is good at task Y you know where to send the workload… but what if the performance pattern is similar but sometimes better?
I don’t really like heroic programming. One of the things in my todo list is to reimplement an old (I could almost call it legacy) system poking around here which, for some not well understood reason, started eating 10x the time to process (considering the CPUs have gone at least 2x in the meanwhile, this isn’t a good thing). It was far below the requirements when it was deployed. I have little clue, and I’m not excited.
I wonder if it will be possible to use a mixed approach as a way to lower the bottleneck on GPUs in laptops. Today, most laptops have the bottleneck on the GPU, while there is tons of CPU power to give and waste. If part of the rendering could be done on the wasted CPU power, we could see significant FPS gain on lower end GPU cards.
From what I read about Larrabee, it was made for laptops that can’t afford (price, space, energy consumption) to have a strong GPU card.
But anyway, the tendency now is for games to use a lot more of CPU power because of physics effects (anyone played GTA4 on a dual-core CPU? the FPS gain to go quad-core is absurd).
Kudos @Niniane. Excellent points.
@Jeff Atwood, as far as this:
but I do believe that the Abrash and Sartain code represents best possible performance
I’m glad you’re not a researcher! I bet you are too as they tend to have prove those kinds of statements.