The Hot/Crazy Solid State Drive Scale

I’ve been using an Intel X25-M SSD for 3 years and it “failed” 3 times.

Every time I’ve been able to make it usable again with the same procedure:

  • Boot under a Linux live CD that includes a NTFS driver (for example, ubuntu’s “try ubuntu” mode)
  • Mount the SSD in read-only mode (mount -t ntfs -o ro…)
  • Copy the disk’s data to another disk. This will recover most data, except files that were being updated when the disk failed
  • Use “ATA Secure Erase” to completely erase the SSD: https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
  • Repartition and reformat the SSD
  • Restore the disk from a backup.
  • Copy data that was recovered while the disk was mounted in read-only mode (to partially recover data that’s more recent than the last backup)

I’m not sure if that would work with other cases of SSD failures, but it could be a solution instead of buying a new SSD every time. In any case, you would still need a good backup strategy and use a non-SSD for documents (except those that really need fast IO).

Has someone noticed oddly that almost all the comments here are reporting there SSD’s have laster over a year? Jeff can you mention what was the usage trend of these SSD’s? I have had a G1 SSD X-25m 160GB, among the first from the hot batch which I got from a intel guy to test drive there new babies. It has not failed in the last 2 or so years…

Jeff,
Would you please provide more details about these failures? You say:

"And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty."
Does this mean no data readable on any sector of any of the failed SSDs? Or was it recoverable when attached to another machine (such as outlined in Christos' comment earlier)?

Cheers,

Maybe I’m just not with the times, but I don’t have an SSD and have never thought to myself “Boy, I wish this was going faster”. On the other hand, I have thought “Hmm… I need more storage”

Several years back I got a TI-83 plus, and the manual contained an warning that it is a bad idea to repeatedly archive and unarchive things from the flash memory because doing so could damage it. Ever since then I haven’t trusted SSDs and I still think of them as far less reliable than a rotating hard drive. Thanks for helping confirm superstition for the 50th time.
As a side note, it wasn’t until about a year or two ago that I finally accepted that the same technology that powered all of the painfully slow USB2 flash drives that I have used over the years could ever be as fast as was a HD (even the 5.4k ones).

I’ve been using a 160 GB Intel X-25M Gen 2 for 16 months or so in my laptop, for work and personal use, and it’s still going strong. My OS and apps are on there, and then I have a pair of striped disks for other stuff. I have a Windows home server that does a decent job backing it all up…without that I’d have mirrored the other disks.

The SSD drives are expensive, granted, and I can see why somebody might try to justify the time savings…but to me that’s not the point. I don’t mind spending on things that matter to me, and taking the hit on other things. Given that I spend 8-10 hours a day on my laptop, the perception of speed makes me happier. It’s a quality of life issue.

I applaud your courage (deep pockets), but this technology is still immature. I’m a poor college student and as such, do not have the revenue to continually buy a new, extremely expensive piece of hardware.

I was on a single core laptop for like three years before I finally brought a custom desktop (3 core AMD Athlon II/4gigs DDR3 Memory/500Gig WD HDD). While my projects don’t call for a stronger machine now (not that I could afford it anyway), I’d rather wait for the technology to cheap and get a bit more mature.

Really, it’s a matter of convenience vs. stability, and which one is more fitting for you. SSDs are not fit for everyone right now, because of the issues highlighted in the comments/post.

I do, however, await the day they become stable technology!

Surely there’s a 2-3 year guarantee on those discs. If they fail every 8 months it must cost the producers an arm and a leg to replace them all. You have to provide 4 SSD’s for the price of one.

I know I am tempting fate here but I have had three computers with SSDs for more or less the same period and I only had one failure (one SSD died after 24 hours of use but that also happens with other components). Two of them are in laptops: a Samsung second gen. with 60 Gb and an OCZ Vertex 2 120. In my desktop I have two first generation Samsungs with 60Gb each connected in RAID 0 (yes, ZERO not ONE :-).

But then I also have daily backups (just in case). I may be the odd one out but, as always, one only hears about the failures, not about the systems that keep working just fine.

I think Jeff just has a case of “too much money”. Most home users don’t have the money, or most certainly don’t want to spend, $500 (OCZ Vertex 3 240 GB, newegg) every 6-12 months because a drive fails. It’s not about the backups, because a 500 GB spinning platter for backup is cheap, and you should have this anyway. Sure if you have tons of money, or your employer is paying for the SSD, then go right ahead. But failing this often is just not an option. $500 would be 1/3 to 1/2 the price of a complete new system. I don’t want that failing out every year.

Jeff,

That is exactly what I’m seeing as well. As an early adopter of SSD’s I’ve found the consumer grade stuff dies in non-fun ways. The total failure of the drive leads me to believe it is a manufacturing problem not a problem with the NAND flash chips themselves.

If you look at the specs most of them are rated at 10^15 on uncorrectable errors, same as their HDD brethren. What most companies won’t tell you is the hard failure rate in the field. This alone is why I warn people away from putting consumer grade SSD’s in their servers.

I blog about storage at http://www.sqlserverio.com and cover SSD stuff quite a bit.

Jeff, the company where I work has ordered SSDs for everyone. We can’t wait for them to arrive!

SSD is a no-brainer for me. One commentator said, it will save you a minute when you boot, so what - he must not understand what it’s like to build a large enterprise application consisting of tens of thousands of files, when the disk I/O is a productivity-killing bottleneck. If a 30-minute build becomes a 5-minute build, we save 25 minutes per build, at a rate of X dollars per hour, and we do Y builds until the SSD dies, equals to Z dollars and there’s our return on investment. Faster builds also mean more frequent builds and integrations, which should lead to higher quality of code. We need to put a dollar value on those improvements (fewer tech support calls, etc.) and include it in our ROI calculation.

I definitely see the ROI, so – I don’t think the hot-crazy “How I Met Your Mother” analogy is really necessary :slight_smile:

Over at blekko, we’ve had 3 SSD failures after 1.5 years, out of 700 drives. These are Intel X-25M 160G2 drives.

I was just looking at moving to SSDs last night. I’ll probably wait until the new Intel drives come out later this year. However, I have a serious question.

Right now I run two drives in RAID 1, and I do full system image backups every couple of days to an e-go via USB. My question is, does it make sense to run RAID 1 with SSDs? My concern was that I would just be wearing down both drives just as quickly, without any striping benefit. However, if what you say is true about frequent SSD failure, then RAID 1 actually appears very smart. Thoughts?

Also, when these SSD drives fail, what do you do with them? Return them or chuck them out, or do you reformat them and wait until the next falure?

Jeff
I am scared

#1
I am checking my SSD health status with http://ssd-life.com/ once a week.
Currently it says: Estimated lifetime: 26years, 11months 25days.

Do you see any value in tools like those?
Do they actually work? Or is my Friday afternoon paranoia check useless?

#2
Why do SSD drives fail? Do they all fail for the same reason?

Thanks
Interesting view on SSD drives and nice blog post!

From the perspective of someone who recently downloaded and began to read “Our Choice” book from Al Gore, your SSD strategy isn’t quite pointing the right direction, resources wise.
The fact that I’m the first to comment form this perspective is also worrisome… but let’s be optimistic. People of the Nerdery, please consider your impact on the environment when getting new equipment and take a step in the right direction: favor stuff that will last over stuff that will save you 10 minutes per day. We live in a spaceship, supplies are limited, the equation of your choice needs to consider this too. Thank you.

I’ve had my SSD (Intel X25M) in my desktop for exactly two years with no failure; the machine is powered on (full, e.g. no sleep/hibernate) 24/7.

The first thing I did in Windows was move the page file to a mechanical disk, along with the browser caches. The last thing you want is lots of small writes to the SSD. I only have the O/S and apps on the SSD. Data is on the RAID1 mechanical Velociraptors (of which, one has failed, and are the same exact age as the SSD). My experience with WD’s raptor/velociraptor drives has been very poor which is why I RAID’d them in the first place. These mechanical drives are hardly used at all other than for the page file and browser caches. All my data sits on a 4TB RAID5 NAS with WD Caviar Green disks, which has had zero failures and is also powered on 24/7.

If you look closely at the warranties of SSDs, you will notice that they are shorter than their traditional Rotational counterparts. Flash has well known issues such as write speed degradation with time, and a very limited lifetime for every cell.
I think one of the wiser approaches is to use flash as a non-volatile cache, so that when your flash crashes, so you can have the peace of mind that your data is safe in a traditional hd while enjoying the speeds of flash.

I think the problem is Jeff is speaking from a Professional view. For average joe, they simply dont have that much money. ( Of coz they have warranty, but in some places where mail order are not available, traveling to repair center would properly cost 40% of your SSD )

And people just dont appreciate how much we have advance for HDD in terms of mechanical perspective.

That is why i think the latest Intel Turbo Cache on Z68 will finally make a difference. It requires a minimum of 20GB drive, so the features will actually guarantee to speed up your performance. Previous generation allows you to use as little as 2GB which isn’t helping at all.

If you think about it. Most of your frequently use files, minus your multimedia files aren’t actually that much. Your frequent Windows 7 could fit within 3GB space. ( You dont need Help files, Drivers, backup files… etc ). Those will continue to live on HDD. With RAM being very cheap, putting 8GB of memory in, you could even disable Pagefile. I wouldn’t recommend disabling it with less the 8GB just to be safe. And to all those who argue that Microsoft recommends you to have pagefile on, Microsoft actually have Pagefile OFF by default on their Windows Embedded PC version and some other versions.

The rumors point to Intel selling this “Larson Creek” 20GB SLC Cache for only $5x. This should give you 90% of SSD performance for relatively cheap price. And You dont have to worry about your SSD dying because all of your Data will still be intact on your HDD.

I recently became addicted to the speed of SSD’s when I put a new Vertex 3 in my 2011 MacBook Pro. This thing is absolutely amazing and I was willing to sacrifice HD size (I could only find a 120G) in return for speed.

My computer boots in less than 11 seconds and opening applications like Photoshop CS5 and Illustrator is almost instant. Now I’m a little worried about the imminent failure of my SSD but I keep good backups and by the time it fails there will probably be something faster available.