Oh, and to folk who think that a hardware RAID controller eliminates the write performance problem of RAID5… nope, sorry it doesn’t really work like that.
The big problem with all the parity/ECC RAID schemes (apart from RAID 2 or RAID 3 - which are rare these days) isn’t calculating the parity/ECC bits at all, not with the controllers and CPUs we have today.
The real problem is that typical writes seldom lay down a full, perfectly-aligned stripe covering all the blocks in a single stripe-width with new data, overwriting everything that was there before.
Instead typical writes change only some subset of the data blocks in a given stripe. But you still need to know what the other blocks in that stripe contain in order to calculate the new parity/ECC. At the very least you need to read the parity disk and the current content of any block that you’re replacing (presuming XOR) and possibly you need to read all the blocks from all the disks that aren’t going to be touched by the write and do the whole stripe calculation afresh.
That incurs seek and read delays from the spinning rust, and causes more data to pass over the IO channels, clogging them up relative to the ideal scenario.
This read-before-write problem is why people aren’t keen, even now, on parity/ECC RAID for OLTP-type scenarios, or even for heavily-used filesystems where you choose to allow the access times of files to be updated. For decision support databases, and other read-predominant (including metadata) scenarios, RAID 5/6/7/whatever is just a more resilient type of stripe and is a clear win.
BTW, apart from WAFL and ZFS, most storage subsystems all the way up to the user-level still happily believe that a block from a disk is what the disk says it is without referring to the other disks and/or calculating the parity bits even in a RAID.
They do this because disks are meant to be reliable and either fail-fast or at least meant to be honest and up-front about unrecoverable errors. By not being uber-paranoid you get more actual throughput and lower latency from the IO subsystem, which most-times seems like a good trade.
The bad news is that sometimes disks, controllers, IO busses, can and do suffer from Byzantine faults and you end up with unacknowledged garbage reaching your programs and users from supposedly protected storage. Sometimes you even resilver bad data over good in a mirror. And I have to tell you that it’s a pain in the arse to figure out what’s happening, how to fix it, and how to contain and recover from the business/science/whatever impact of that. Pick up your cane and vicodin, stick on the deerstalker and disguise and have at it.
And while ZFS RAID protects you against more hardware failures than ever before, and you can use ZFS and WAFL snapshots to help you recover more gracefully from user thinkos (“what do you mean, you didn’t want to delete your whole working directory?”). You still need those remote off-site backups to protect against fire/flood/hackers/mad-axeman/police-confiscation/etc…
Blah, sorry, I typed too much. Storage, it matters y’know?