Beyond RAID

People that read these posts and then say something along the lines of…

Congratulations on another pointless post.
Phillip on May 26, 2009 9:51 PM

deserve to be chemically castrated for the good of all humanity. I mean, seriously if there’s something you don’t enjoy reading… how about (/smacks self on head w/ microphone) NOT READING IT!!! I’d have to be a world class idiot to keep coming back to something i feel needs criticizing that much. Its like buying a playboy to make fun of how fat you think the models are. Seriously man just get over yourself and find something else to read.

Some would say that RAID 10 is so good it completely obviates any need for RAID 5, and I for one agree with them.

Only problem here is that you are “losing” 50% of your disks rather than a single disk.

Sure, the performance is better, but you 50% is a pretty big cost.

For RAID1 you have:

Data is written across (n) drives, […] at the cost of half your overall storage.

Shouldn’t that be “at the cost of (n-1)/n of your overall storage”? You said that if any one drive survives, the data survives; so the data must be duplicated n times (once on each of n drives). Thus if you have 4 drives, the cost of redundancy is 3/4 of your drives.

I disagree that RAID isn’t useful for desktop systems - but personally I prefer mirroring (and as an addition to, not replacement for, backups). Backups only run every 4 hours here, and I’d hate to lose 4 hours of work. And I’d hate having to wait for a replacement drive and rebuilding the system, mirroring also saves me from that.

Striping is pretty meh. Sure, you get wonderful linear performance, but it doesn’t really help wrt random I/O… which is what you’re doing most of the time, unless you have pretty specific needs. And it doesn’t help with all game loadtimes. Heck, I even tried placing FarCry2 in a ramdisk, and it only gave a minimal performance boost compared to cold-loading from disk. Depends on what the game is doing during the loading phase, of course. But check Process Explorer I/O read stats while loading and you’ll see that a lot of games don’t even utilize single-drive bandwidth.

Jessica Boxer,

In reading your posting, I KNOW I have spoken with you before and would like to talk to you, privately. You have my e-mail addresses and Yahoo IM link. PLEASE get in touch!! I would greatly appreciate it! Thanks!

It’s worth noting that the global credit crisis (which preceded the global financial crisis) was caused by the economic equivalent of RAID-Z.

In brief, investment firms believed that when individually risky elements are clustered together (in this case, home loans), the outcome is low risk. And they believed you could lower the risk to zero by scaling up the number of elements.

The crisis was caused when the demand for investing in home loans was so great that they started accepting riskier and riskier loans as long as the interest rate scaled with the risk.

So what they ended up with was equivalent of many huge data centres each with hundreds of racks containing dozens of Sun Sunfires filled exclusively with IBM Deathstar 75GXP drives and plastic explosive.

“Congratulations on another pointless post.”

Stop congratulating yourself.

You still need those remote off-site backups to protect against
fire/flood/hackers/mad-axeman/police-confiscation/etc…

Very important. JournalSpace, AVSim, and Ma.gnolia all suffered complete loss of all data recently. In each case, they had online backups where both copies were destroyed. No offsite backups, no tape backups.

Jeff: Thanks for writing a post that directly relates to what I do everyday as I write firmware for a SAN.

@Jessica Boxer

If the data has lasting value (i.e., doesn’t expire), then you should have an archived/read-only copy of it. Even if you all but guarantee yourself against hardware failures, you still need to protect yourself from human error or maliciousness. So in a sense, you will always need an ‘offline backup’; it doesn’t have to literally be ‘offline’, it just has to be un-modifiable. For example, you could keep all your data in your live ZFS file system and take periodic read-only snapshots.

However, if the data expires or needs to be updated on a regular basis anyway, then it probably doesn’t matter. I’m guessing a lot of Google’s data falls into that category.