Gigabyte: Decimal vs. Binary

Everyone who has ever purchased a hard drive finds out the hard way that there are two ways to define a gigabyte.  


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/09/gigabyte-decimal-vs-binary.html

“If you’re wondering where 35 megabytes of your 500 Gigabyte drive just disappeared to”

You mean 35 gigabytes, presumably.

Yep, I meant 35 Gibibytes! :slight_smile:

You said it in the middle there: it’s us 1024 types who are in the wrong here, and the sooner we give up this particular windmill the better we will all be.

When I’m buying a hard drive, I really don’t care that the 1TB drive has 931 “GB” of space. I do care that one “1 TB” drive will have significantly more space than another “1 TB” drive, but I can usually find that particular info on the side of the box or the manufacturer’s web site after a bit of digging.

I don’t know how you go shopping for a new hard drive, but I look at what data I have today, compare that to what I had a year or so ago to come up with a year-growth factor, then look for hard drives which are at about two year’s larger than my current needs. Then, I look at what’s there, decide the wife would have a fit if I spent $1000 on a new drive, and compromise down.

EVEN IF that number that I started out with (how much space I’m using today) was in *bibyte values (as typically reported by the OS), the end decision wouldn’t change at all, because anything within, say, 25% of the desired size is just lost in the noise.

Still, though, when I look at how much space I am using, I use the byte counts (OS X gives that in parenthesis right next to the “*bibyte” “friendly” count so it’s not like this is much more work to come up with; seems like Windows offers a quick route to the actual byte counts too, right?).

All in all, it seems a moot point. So, in the end, I agree with Alan Green’s friends.

I noticed this disparity first when I bought my first CD-R. Since then, I orient myself on the much more meaningful unit “minutes”, usually in CD-Quality audio that can be packed on a CD-R (MPEG-2 compresses movies when it’s DVD+/-R(W)).

I’ve noticed (or maybe this is my imagination), that DVRs use the “minutes of video”, too, for their hard drive capacities.

I don’t know about you, but I can grasp “30 hours of video” better than “500 GB”.

You’re so right - those binary prefixes sound soo ridiculous. Like teletubbies, if anyone remembers. :wink:

I don’t understand the assertion that hard drive makers are pulling a fast one by using ‘real’ SI units in order to make the capacity seem higher. The only think anyone ever does with that number is to compare it to other hard drive capacities, so what’s the problem?

Also, I’m hard-pressed to think of a scenario where you really need to keep in mind the discrepancy between the two definitions. When are you ever comparing hard drive GBs vs. something legitimately measured in GiBs?

Ha! I’ve never even heard of a kibibyte.

The annoying thing with the power of 2 vs power of 10 stuff is that it didn’t used to be that way. The vendors switched at some point to juice their numbers - extremely weasily.

“It’s us computer science types who are abusing the official prefix designations”.

That’s one interpretation. I often prefer to think that we’re challenging SI’s jurisdiction. Since metric units of measurement predate any common usage of non base 10 numbers by quite some time, my guess is that the decision that kilo was base 10 was pretty arbitrary anyways.

Why not have the unit’s suffix determine the base, instead of the prefix? It’s not just a bytes issue – kibimeters is patently ridiculous too. Bytes should always be base 2, meters should always be base 10, and if not, it should be specified explicitly.

If the above diagram is correct (I’m sure it is), when “giga” was specified in 1960, there was already prior base 2 usage. In other words, they broke pre-existing business logic, so it’s their fault!

:wink:

The annoying thing with the power of 2 vs power of 10 stuff is that
it didn’t used to be that way. The vendors switched at some point to
juice their numbers - extremely weasily.

I’d like to see proof of this. I don’t think there ever was a switch, I think it always was like this. The oldest hard drive add I can find advertises 5 million 7 bit characters. Floppy disk and CDs were in 1000 byte kilo/mega bytes. Except some floppies which were even weirder with 1024 * 1000 byte megabytes. Can anybody produce a hard drive advertisement with 1024 * 1024 megabytes as the unit?

The second calculation has “465” twice. The first instance should be “500”.

The problem has an easy solution:

The OS needs to start using SI prefixes correctly. Linux already does.

$ dd if=/dev/zero of=test bs=1MB count=10
10+0 records in
10+0 records out
10000000 bytes (10 MB) copied, 0.0261481 seconds, 382 MB/s

There’s no reason to use powers of two for displaying file sizes. It’s ridiculous and makes it more confusing for the user.

It should be noted that network speed has ALWAYS been in base10.

Your ancient 10baseT ethernet card? That was 10 million bits/second.
a gigabit card is 1000 Mbits, not 1024.

“Why not have the unit’s suffix determine the base, instead of the prefix? It’s not just a bytes issue – kibimeters is patently ridiculous too. Bytes should always be base 2, meters should always be base 10, and if not, it should be specified explicitly.”

(first, it’s not base-2; it’s base-1024. What would “deka” and “hecto” map to in “base 2”? “8” and “128”?)

Huh? Why would I want a completely different series of multipliers with the same name as the universal (not just meters: EVERY SI unit of measure uses the base-10 system!) standard? Why propagate confusion? Instead of just learning “kilo- means 1000”, the rule is “kilo means 1000 when the base is meter, liter, gram, watt, ampere, joule, [list continues for a page or so and needs to get updated with each new “thing” to measure]; it means “1024” when the base is byte”? That’s ludicrous!

First: what reason is there for defining a kilobyte as 1024 bytes instead of 1000? Why not measure in 2^8 increments (256, then 65536, then 16777216, etc), since that reflects the number of 8-bit bytes used to store it? Does anyone use 10-bit bytes anymore? Is, in fact, the ONLY reason we use 1024 as the base for file sizes that it APPROXIMATES 1,000 as a power of 2???

Second: why on earth are OS’s still so bass-ackwards and reporting these 1024-power numbers instead of a sensible standard that matches what the rest of the world uses?

I worked on the system information utility for Windows (msinfo32.exe). There were periodic bugs filed of the form “I have a 500GB drive and msinfo only reports it as 465GB!”.

You can’t please everyone. If it was changed to powers of ten it wouldn’t match the rest of the operating system. And if you changed the whole operating system, it wouldn’t match legacy systems. It’s hard to turn that barge once it has momentum…

Jim: you’ve got to start somewhere. The programs that currently use “KB” to describe 2^10 are wrong. File a bug. Linux is currently in a transitional period. Nautilus has several bugs filed against it for reporting file sizes incorrectly… most GNU programs, like the one I demonstrated above, are correct.

I don’t understand why geeks are so set on using powers of two divisions. Jeff’s complaint is that he doesn’t want to sound stupid by saying “kibibyte” it never even occurred to him to use powers of ten. WTF?

If you’re going to use base-10 numbers, you should be using base-10 prefixes. As soon as you start talking about a 0x1f4 GiB harddrive you can start complaining.

Speaking of google and bytes, google still thinks 1 kilobyte (1 KB, to them) is equal to 1024 (2^10) bytes: http://www.google.com/search?hl=enq=1+KB+in+bytes

Google’s wrong.

http://www.google.com/search?hl=enq=1+megabit+in+bytesbtnG=Search

That’s not correct by anyone’s definition.

Personally I think zetta and yotta are pretty lame prefixes, so I’m glad I won’t have to deal with saying them in my lifetime. Maybe I’m just more familiar with the smaller prefixes, even the relatively exotic peta and exa. When was the last time you actually needed to express something in terms of petabytes?

My children can deal with the yobibi and zebibi controversy.

Shannon: “The worst was the “1.44MB” disks. These are actually 1044 kilobytes. (1044 * 2 ^ 10 bytes, in case people aren’t keeping up.)”

I think you mean 1440 kilobytes (1440 * 1024 = 1,474,560 bytes).