The Hot/Crazy Solid State Drive Scale

Ovi_ · May 3, 2011, 12:00am

Well isn’t SSD FAST? You shave here and there seconds at some applications start (first start that is if your short on RAM and I guess you’re not). And then you stay off the computer for a full day at best to replace the drive, not cosindering other inconveniences like shipping the bad drive for replacement if under warranty, restoring from backup which is of course complete and taken the minute before your drive vailed. Looks really ugly to me and I write this using a SSD one year old that didn’t fail yet. It’s time for a new backup.

Flori · May 3, 2011, 12:00am

Wellm has anybody thought about ecological impact besides just thinking in $$$? SSDs will have a bigger CO2 footprint, use more rare earth source materials during production. The more of them we throw away, the more you pollute the environment. A harddisk is already bad enough, so lets not replace it by something which is even worse.

Guys, all I read here is ROI etc… we cannot continue like that, I think. Considerations must be made from all points of view, not only from the finanical/economical one.

chadly · May 3, 2011, 12:00am

Of interest to me is how these failure rates correlate to the operating system on which they are used. Are the failures consistent regardless of OS or is there a culprit?

c_k · May 3, 2011, 12:00am

You may want to examine how your using the disks. If I remember correctly, flash memory cells have a limited number of write cycles (it used to be 100,000 cycles, but i’m sure its gotten better) , and these cells are usually erased and written in blocks. I’m sure the number of cycles is pretty high, but any number of things that would cause continuous writing to the disk may cause it to die earlier than expected. For example a daily virus scan or backup procedure would cause the last access time and archive flag on all of your files to be updated daily, resulting in a lot of unnecessary write cycles (I believe windows can be configured to not update the last access time). Also, if you have a low memory system or do memory intensive work and the system swap file is located on your SSD, you may be doing continuous writing to the disk that would wear it out.
I guess in short, flash memory is good for fast storage but is simply not meant for continuous small random writing. I would disable last access file time updating (http://windows7themes.net/registry-tweaks-how-to-disable-last-access-filestamp-in-windows-7.html) and possibly even move your swap file to a standard hard drive. Then I bet you will see a much better life span.

Jim_Fell · May 3, 2011, 12:00am

What is considered a “generation” for a technology platform?

ChipO · May 3, 2011, 12:00am

For a slightly different viewpoint:

I’ve had way too much experience with SSDs in the past three and a half or so years. All of it is with high-availability mission-critical embedded devices using surface-mount SSDs with IDE interfaces, quite a different form factor and application from the ones you are probably using. Most of my experience has been bad.

The SSDs fail catastrophically after a few hundred power cycles. That’s just enough for them to get through system testing, as we were to find out, then fail hard in the field after a few months of use. This is not an issue of exhausting the write-cycles on the underlying flash. We do very few writes, and all the devices we use feature wear-leveling.

We know that the failure is in the NAND flash part itself. We can remove the flash from a working SSD and place it in a failing SSD and the failing SSD “revives”. The SSD apparently has its own firmware in the NAND flash part, since we cannot use off-the-shelf uninitialized NAND flash parts of the same brand and model to the same effect.

Life is not good. I cannot recommend these kinds of SSDs to my clients for these kinds of applications.

(Sorry, NDA keeps me from saying much more.)

ChipO · May 3, 2011, 12:00am

I should also mention, this is not an issue with file system corruption. We can deal with that. Failure modes include: boot blocks or partition blocks being corrupted (we never write to those in the field), or the device quits responding to commands at all (this is when replacing the flash part revives it).

JohnS · May 3, 2011, 12:00am

Very odd.
I’ve had my “34nm 160 GB Intel® X25-M Mainstream SATA Solid State Drive” since 5/18/2010, and it works like a champ. I use the Intel SSD Toolbox as well.

Jerold_Haas · May 3, 2011, 12:00am

So, it appears, from the article at the bottom of this comment, that the “write limit per block issue” is no longer an issue. I’ll search here and there to find a reason why they fail so easily (no, imported, “cheap,” manufacturers won’t count, as expensive manufacturers have the same problem).

One thing I’ve considered is that many electronics purchased from circa 2005 have had issues with electrolytic capacitors, usually used in the power circuit of electronic devices. Chinese electrolytic capacitors used in manufacturing of consumer electronics during that period (~2005) have had a short MTF (MEAN TIME to FAILURE) (see Samsung LCD monitors, most have faulty power supply circuits due to these shoddy capacitors).

Article is dated, but has some useful infromation:
http://www.storagesearch.com/ssdmyths-endurance.html

Jim_Babcock · May 3, 2011, 12:00am

That’s way too many SSD disk failures to be a coincidence, and this does not jive with statistics gathered from a broader base; I think your drives are being damaged by something external. Have you been plugging all those SSDs into the same power supply, by any chance? Do you run your wall power through a UPS? If SSDs were that unreliable in general, there would be users with thousands of them and failure statistics making noise about it.

Nick_Watts · May 3, 2011, 12:00am

My computer at work has an SSD from circa Jan 2011, 12GB RAM and a 3.2GHz Quad-Core Xeon. My home desktop has a HDD from 2007, 2GB RAM and a Core 2 Duo.

The home desktop does what I want much faster, much smoother, much more responsively. The difference? Software.

Work machine = Win 7 + Visual Studio.
Home machine = Debian + XFCE + Geany.

HDD is not the main problem.

Codemadesimple · May 3, 2011, 12:00am

Here at the project that I work we have been using HDD’s and SSD’s for quite some time and we could make a really good real-life test. We can say for sure that: YES, SSD’s really pay their price in speed. I’ve made a post detailing the subject here: http://codemadesimple.wordpress.com/2011/05/03/ssdpower.

Stationstops · May 3, 2011, 12:00am

I’ve had 4 SSDs since 2007 (first was an $1100 32GB MTRON, then Vertex, Falcon, C300). No failures, plain ol everday MacBook/Xcode/PS use mostly. Not casting doubt on anyone else’s bad luck, just another anecdote.

DanielO · May 3, 2011, 12:00am

Largely on Jeff’s recommendation, my boss got 10 developers 256 GB SSDs (for about $750 each) ~ 2 years ago. They were blazing fast, but only one survived to see its first birthday. He even gave up sending them back to Crucial; since after the first few were replaced and failed again, it became clear that any replacements with SSDs would inevitably cause yet another unexpected half-day down time for that developer within in the next year, and most likely at least some lost work even with daily drive image backups.

Despite my begging for another SSD, he won’t let us touch them now. Some things are just too crazy for some people, no matter how hot they are.

I think a major contributing factor to early failure was the mandatory whole-disk-encryption; but based the nature of our business this is not negotiable. If they could encrypt natively on the drive using a managable key scheme that kept our sysadmins happy, the drives might last longer and we would pay a premium for such drives.

Marketmentat · May 4, 2011, 12:00am

@BloomCB - “That drive could nuke before I’m done with this post and I would be just fi”

I lol’d hard. Srsly, I wish I could think up stuff of that level of clever (maybe it was derivative from the 4chan ‘sniper’ threads, but yours was still gold because it was totally unexpected).

Lulz aside, it seems it’s all about the little frequent writes. And @Daniel Olsen - if you’re doing whole disk encryption, you’re asking for trouble with SSD. But how can you & sysadmins not find a decent encryption protocol that doesn’t use whole-disk? Even TrueCrypt (as one entry-level example) would suffice for 99% of things I can think of: it doesn’t ‘bleed’ unencrypted data at all.

Zippyshare · May 4, 2011, 12:00am

I’ve got a production server running a database on two Intel X-25M (80GB) SSD’s. The drives have been working flawlessly for 14000h by now.

MarkK · May 4, 2011, 12:00am

My 300Gb 10,000rpm Velociraptors have died in droves. I have 5 of them, 2 in RAID 1, 2 in RAID 0, and a spare. I have replaced 8 drives over the last 1.5 years. That’s right some have been replaced twice!

The MTBF numbers are very misleading.

What I don’t understand is how I have had computers that lasted 6-10 years as hand me downs when I’m done with them on cheap consumer drives and all the “server” drives fail after so little time.

AndrewS · May 4, 2011, 12:00am

I have shipped literally hundreds of Intel G1 and G2 SSDs to my customers and never had a single in the field failure (save for one drive in a laptop where the drive itself functioned fine but one of the contacts on the SATA connector was actually flaky, probably from vibrational damage from a lot of airplane flights, and one DOA drive). I think you just got unlucky there.

The other brands in question, however, kinda suck.

Msgflaw · May 4, 2011, 12:00am

The problem with SSD failure must be due to vibration and/or heat from use in laptops. I’ve had two Intel X-25M 80GB Gen 1 drives in my home desktop for a little over two years without failure, and I’ve had six Gen 2 drives in my work desktop for about a year without failure. At my office we have a couple more workstations with multiple X-25M’s, and I think there has only been one instance of failure in the last year.

Kurt_Merkle · May 4, 2011, 12:00am

I understand that once they approach the 1TB threshhold, the Bank of America pricinple will apply and the will become TOO BIG TO FAIL.