To ECC or Not To ECC

Eamon_Nerbonne · November 22, 2015, 10:46am

codinghorror:

One thing I noticed: letting the CPU directly control clock speed switching, rather than the OS, helps some benchmarks a bit. This “Speed Shift” is new to Skylake.

Examining Intel's New Speed Shift Tech on Skylake: More Responsive Processors

Compared to Speed Step / P-state transitions, Intel's new Speed Shift terminology, changes the game by having the operating system relinquish some or all control of the P-States, and handing that control off to the processor. This has a couple of noticable benefits. First, it is much faster for the processor to control the ramp up and down in frequency, compared to OS control. Second, the processor has much finer control over its states, allowing it to choose the most optimum performance level for a given task, and therefore using less energy as a result.

Results

The time to complete the Kraken 1.1 test is the least affected, with just a 2.6% performance gain, but Octane's scores shows over a 4% increase. The big win here though is WebXPRT. WebXPRT includes subtests, and in particular the Photo Enhancement subtest can see up to a 50% improvement in performance.

This requires OS support. Supposedly the latest version of Win10 supports it.

In general I trust AnandTech a lot for benchmarks. They used Chrome 35 for each of the JS benchmark tests. With my mildly overclocked i7-6700k and Chrome 36 x64 on Windows I get 45,013 on Octane and 655 on Kraken which is consistent with what AnandTech saw.

As you yourself note in this quote, the advantage in kraken is just 2.6%, i.e. a tiny fraction of that 28% gain. Also, since anandtech has had 6700k scores up for a while and speedshift isn’t yet in mainstream windows, it’s likely not included in these scores. That’s further corroborated by the fact that the most affected benchmark - WebXPRT - being virtually identical in anantechs 4790k and 6700k benchmarks.

Additionally, if speedshift were the explanation for the gain, you’d expect my personal kraken and octane benchmarks to corroborate anandtech’s scores - instead, my slower 4770k scores considerably higher than anandtechs 4790k.

I’m not sure if it really is due to JS engine benchmarks, but that does fit the facts. In any case, anandtechs scores for 4790k are much, much too low, which makes skylake look better than it really is, by comparison. I’m betting on JS engine improvements partially because moderns browsers try to make it difficult to avoid updating the browser, so it’s plausible they got a faster JS engine without ever intending too.

Regardless: even if JS performance is much faster, almost all other benchmarks don’t mirror this. You might get lucky and find that ruby perf is like JS, but that’s a long shot. It’s much more likely that ruby perf will be like the vast majority of other benchmarked workloads: largely unchanged.

ultimape · November 22, 2015, 8:20pm

Curious if you’ve compared prime95 to something like linpack. From my limited experience testing hundreds of various laptop machines for stability, battery refreshing, and runtime statistics, linpack (intelBurnTest?) seems to kill them a lot faster. They just released a new version: https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download

IMHO, prime95 etc are good for benchmarking - having a consistent test across systems is a good idea - but for burn in and stability testing, I much prefer the one made by the chip manufacturer.

Recently I located the source of an overheat fault with my little brother’s rig using furmark and linpack simultaneously that I couldn’t recreate with prime95/mprime running all night. Only other way was random 2 hours of gaming. Furmark+IntelBurnTest got the the failure state in under 10 minutes. It may be a chip-dependent issue, but I remember reading that linpack uses up proportionally more of the on-board circuitry in the CPU to get it’s computation done due to the way it does it floating point calculation - hence a more robust heat generation tool.

I would love to be proven wrong about it, and I have no idea if it makes a difference for servers

codinghorror · November 23, 2015, 10:40pm

Whenever I have built and overclocked systems, prime95 has been 100% reliable in detecting whether they are stable for me. If prime95 fails overnight, not stable, if it passes… no CPU stability issues at all. I have no idea whether it is the “ultimate” tool but for CPU overclocks it has been incredibly reliable at detecting CPU instability for about a decade now…

Here are some numbers @sam_saffron recorded for Discourse:

build master docker image
build01 16:00
tf21     5:33

This is not a great test since our build was running in a VM on an eight core Ivy Bridge Xeon which has a lower clock speed, and a RAID array of traditional hard drives. Nowhere near an apples-to-apples comparison. But, 3x faster!

running Discourse (Ruby) project unit tests
tf9   8:48
tf21  4:54

This is running the Discourse project unit tests in Ruby. It’s a perfect benchmark scenario as tiefighter9 is exactly the 2013 build described in this blog post and tiefighter21 is exactly the 2016 build described in this blog post. And everything runs on bare metal, Ubuntu 14.04 x64 LTS.

As you can see here, tiefighter21 is almost 2x faster: 528s for the 2013 Ivy Bridge server build, and 294s for the 2016 Skylake server build. Our new Skylake based Discourse servers are 1.8x faster at running the Ruby unit tests in the Discourse project, to be exact.

I hope that data answers your question definitively since you both kept asking over and over and not believing me

dakull · November 24, 2015, 1:52pm

@codinghorror wow, that’s just like wow. I wonder if this has any relevance:

EDIT:

I don’t want to be a nit picker it’s just really hard to grasp that switching to Skylake would yield a 1.8x increase solely because of the CPU when the difference between tick/tock on the Intel side barely added +5% performance gains per iteration.

codinghorror · November 24, 2015, 6:50pm

Tiefighter 9 actually has the 512gb Samsung 840 pro because the 830 was no longer being sold at the time that one was built.

Anandtech says avg data rate on the same test for the 840 model is 285 vs the 309 you quoted for the 850 pro. The 850 was mostly more consistent in perf than the 840, not so much faster.

Sorry, I should have mentioned that. I am sure disk is a factor, but I would still expect no less than 1.33× improvement based on the single threaded JS benchmarks at the same clock rate. Plus, the base clock rate is higher as previously noted multiple times – 3.6 Ghz versus 4.0 Ghz, that is more than a 10% improvement on the basis of raw clock rate alone.

Anyway, those are the actual Ruby unit test numbers from the current version of the Discourse project, which is completely open source. You can download it and benchmark these tests yourself if you don’t believe our numbers.

dakull · November 24, 2015, 10:54pm

Ok, so I wanted to check if the tests were IO bound:

pcie ssd: Finished in 6 minutes 18 seconds
fast microSD drive: Finished in 7 minutes 52 seconds

Clearly IO is not a real issue since the microSD card is orders of magnitude slower than the pcie SSD which would translate in reality to a delta of seconds between let’s say the 830/840 and the 850.

Still 1.8x is huge compared to the IPC improvements and the extra frequency boost. If it’s only the CPU I wonder what exactly makes it that fast.

It’s clearly not the ECC ram that slows down the Xeon E3-1280 V2 since it scores the same as the 3770K:

Maybe the answer is in these synthetic benchmarks:

Where cache and memory, arithmetic et al are from 1.9x to 2.1x faster compared to 3770 (and the E3-1280 V2)

tl;dr: sometimes synthetic benchmarks do translate to reality. In any case this was quite fun to get to the bottom of it

wrong

womble · November 29, 2015, 4:11am

Dan Luu doesn’t think that blindly following Google’s examples is an automatic winner:

Note that these are all things Google tried and then changed. Making mistakes and then fixing them is common in every successful engineering organization. If you’re going to cargo cult an engineering practice, you should at least cargo cult current engineering practices, not something that was done in 1999.

Meanwhile, Joe Chang has a sick burn on Windows 95:

I recall that the pathetic (but valid?) excuses given to justify abandoning parity memory protection was that DOS and Windows were so unreliable so as to be responsible for more system crashes than an unprotected memory system.

codinghorror · December 10, 2015, 2:46am

It is a very good piece from someone who was there, and well worth reading, but a bit… hand-wavy for my tastes. He does not address all three studies, and does not acknowledge that we are not exactly copying Google from 2000, we are using commodity parts that reflect the state of 2016 computing – which is considerably more advanced than 2000, mostly due to massive integration (hey where did those dual CPU slots and network cards go?) and everything going solid state.

The Puget Systems data was also ignored. But it jibes with what I have experienced. For example in 2011 I knew about so many consumer SSD fails I wrote a whole article about it. Today, we used 24 consumer ssds for the last 3 years in our servers and had exactly zero fail.

codinghorror · December 11, 2015, 1:13am

Also, this is interesting.

Introducing "Yosemite": the first open source modular chassis for high-powered microservers - Engineering at Meta

But that solution didn’t work well because the single-thread performance was too low, resulting in higher latency for our web platform

Where else have I heard this… oh yes

Throwing umptazillion CPU cores at Ruby doesn’t buy you a whole lot other than being able to handle more requests at the same time. Which is nice, but doesn’t get you speed per se.

kernrot · December 17, 2015, 11:53pm

Also, this is interesting. IEEE publishes study by University of Toronto on RAM corruption issues - far more common than previously estimated.

DRAM’s Damning Defects—and How They Cripple Computers DRAM’s Damning Defects—and How They Cripple Computers - IEEE Spectrum

BenBasson · January 10, 2016, 2:47pm

After reading this blog post and looking at the amount of dicussion about this issue, one could easily argue that the cost of making a decision between ECC and non-ECC is greater than just buying something and living with the consequences.

codinghorror · February 9, 2016, 10:50am

“As a whole, hardware appears to be continuing the trend of becoming more and more reliable.” https://www.pugetsystems.com/labs/articles/Most-Reliable-Hardware-of-2015-749/

jb24 · March 13, 2016, 4:08am

Should I now wait 2 years for enterprise tech bargins. After 15 years has enterprise tech as a sector become moot? Or jump with both feet into 2016 skylake?

darix · March 14, 2016, 6:11pm

DEF CON 19 - Artem Dinaburg - Bit-squatting: DNS Hijacking Without Exploitation

PlusOneCharisma · June 29, 2016, 3:45am

Great stuff. It seems that the 2007 study you linked to is at odds with the findings of several of the other papers. Specifically, it (a) assumes that there is no correlation between soft and hard errors, and (b) uses the Poisson distribution to compute the upper bound of soft errors. Given that the field study with the largest pool of machines finds that the occurrence of a single soft error greatly increases the likelihood of further soft errors, and also often is the precursor to a hard error, both of these assumptions are suspect.

Without meaning to muddy the waters, I am also struck by the fact that ECC seems to be discussed in isolation, rather than as part of a greater effort to preserve data integrity. Surely a system that is important enough to have its data protected by ECC should also be using an atomic COW file system such as ZFS or BTRFS? Of course, I am approaching this as someone looking to put together a single workstation PC, rather than a server farm where it may well be that everything fits into memory…

camilus · August 23, 2016, 11:57am

I would love to have ECC ram if Intel was not so greedy and stopped artificially crippling it’s sanely priced chips (disabling ecc support, disabling ht on i5s, locking multipliers, randomly disabling virtualization extensions, …), included support for the now enabled functions in all chipsets and made it mandatory for motherboard manufacturers to support them too (with buggy implementations not counting). Giving us a few more pcie lanes/sata ports/usb 3 ports, support for more ram (128 gb would be nice) higher tdp desktop chips, adding a few more cores (so an i3 would now have 4c/8t, an i5 6c/12t and an i7 8c/16t) and offering a desktop cpu without the igpu and more cores/cache instead (lets call it an i8 and give it 12c/24t and double the l3 cache). Finally ending the scamming with mobile parts and calling glorified i3s i5s/i7s.

And moving towards an extra memory channel or two for the arch after Skylake (this would also mean doubling the max memory to 256 gb and moving to 6 to 8 memory channels for the expensive cpus which have quad channel memory now and increasing their core count by the same %).

Oh and reversing the price creep by a 30% price cut across the board.

And then I wake up.

womble · September 6, 2016, 10:56pm

Hey, you’ve gotta have your product differentiation, or you might not capture all that economic surplus…

Also, just because no post is complete without a SwiftOnSecurity tweet:

codinghorror · September 6, 2016, 11:01pm

And that’s proven how, exactly? Lots of problems can manifest as bad data.

If there is a run of memtest that fails years later after the initial build, I’ll accept that as a valid answer, otherwise… voodoo computing.

Jhkh · October 15, 2016, 10:40pm

Hi, this article is still very interesting in 2016
I am by no means a pro or even a programmer, and I’m aware that personnal experience may have not much value compared to wide scale tests.

But here is what happened to me last year.
I own an Acer consumer laptop with 2x8Gb Kingston modules, I7-4702MQ. Pretty crappy on the power supply side by the way, but that’s not the point.
I had never bothered to think of what benefit ECC could bring.
One day, I did a full system wipe and copied back all my files to the internal hdd (getting it partitionned more conveniently in the process).
All went fine… Apparently. I soon noticed corruption on some frames in many of my videos.

The culprit was one of the ram modules. But the only affected files were bigger than 1Gb. Reproduced the problem with Teracopy, it would happen 1/3 of the time, by copying over and back a big film to my external drive with integrity check.

The memory modules worked fine one by one, and even both in the opposite slots. The MB was failing ? No.
Back in their slots, problem back. 24 hours memtest86 went all clear, but the files were still damaged while copying.
I then cleaned the slots with a brush and compressed air, and voilà ! Even being very cautious with my laptop, dust and moisture had made me loose two days troubleshooting.

The sfc scan reported large corruption as well.

After that day, my computer still works fine one year later, but I do consider buying ECC ram capable rig to avoid damaging part of the backup (including many family films that I consider valuable). And I do periodic integrity checks on my backup external drive now.

Thankfully the damaged videos are just on a few frames and still watchable, but many of them are damaged. So even a careful noob with standard consumer use may run into such issues.

Sorry for the tl ; dr effect xD

shankarab · January 23, 2017, 7:41pm

Hi Jeff, Have you made any price comparisons lately (even better, TCO comparisons) between your custom build servers and an equivalent cloud subscription? (you made one a few years ago, I’m curious to know if any data has changed in favor of cloud) Also, another question I wanted to ask, do you run your web sites in VMs or on the physical server itself? You can technically run 4 VMs on a server with 64 GB RAM (or so I’m told). Thanks for your informative posts.