a companion discussion area for blog.codinghorror.com

Gigabyte: Decimal vs. Binary


#121

Jeff-

Back in the day of FAT16 (Windows 95), “large” (4 gig) hardrives suffered from inefficiency on cluster size. For example, I believe under FAT16 the smallest size was 32K, so if you had a 1K file, it would wasted 31K. FAT32 improved on this, but I still believe space was lost but not as much as smaller (32K) clusters could be defined.

So, 1 Terabyte drive is for “marketing” purposes by the hard drive maunfacturer. You’ll never physically stoe 1 Terabyte.

I’m sure someone can explain the gory details on this better than I can.

Jon Raynor


#122

Gigabytes weren’t an SI unit until the IEEE decided to make them one. It’s rather insulting, actually - reminds me of the gritty cop shows where the FBI would step into a police investigation and say “You guys go home, let the experts handle this”. Except that the FBI actually has that authority, whereas the IEEE just wishes it does.

The meaning of SI prefixes only applies to SI (metric) units, which bytes aren’t. A byte is already 8 bits, and it isn’t divisible into centibytes or millibytes, so it doesn’t even make sense as a metric unit. The composite units were so named because they approximated metric units, not because they were equivalent. That’s not “wrong”; it happens in every industry, it’s just that the nerds in other industries don’t kick and scream anal-retentively about it.

For those claiming that the inconsistency was always there because the network industry used kbps, think again. One of the reasons they used kiloBITS per second was to disambiguate it from storage units. The term was adapted from baud - slightly different meaning, but essentially equivalent by the time of 14400 baud modems, when baud was becoming an awkward measurement anyway. There was a legitimate need to compare bandwidth with storage (the Internet), but it also did not make sense to use powers of 2 because bandwidth was actually provided in powers of 10 (bits). There was no foul play here, just pragmatism.

Memory capacity, on the other hand, is always 2^n bytes. Hard drives are generally multiples of 512, too; when 500 gigabytes is used to mean 500 * 10^9 bytes, it is actually an approximation. The real number might be something like 499,289,948,160 bytes, though it could be more or less depending on the geometry. 500 GB is never quite accurate using ANY convention.

I think it’s obvious that the units for memory and disk should be the same, since data is constantly being swapped from one to the other. So let’s put the question about why the rules for memory should apply to hard drives to rest.

Of course I know what the proposed solution is. Just have everyone switch to the dorky “bi” prefixes! That’s nice, except that every part of the industry EXCEPT for the hard drive manufacturers has been using the same convention for 50 years. You don’t just stomp your foot, shake your fist and tell us to mend our evil non-standard ways. Standards should reflect conventions that are already widely used, not fight them. Frankly, I’d rather deal with the hard drive capacity gap than deal with the silly new SI units invented by academic suits with hardly any practical experience.


#123

Whenever one inured to the inconsistent KB/MB/GB definitions used in some computing contexts first hears the kibibyte, mebibyte, gibibyte KiB/MiB/GiB construction, they think it silly. I did, too.

But after a few years of being bitten by related problems, and having to explain/argue the exceptions, and familiarity with the new words/abbreviations, it looks better.

The use of powers-of-2 internally by computers is an implementation detail that only insiders need to optimize for, in their minds and communications. For everyone else, base-10 works better. There’s no reason for average users to understand or even see KiB, MiB, GiB names/numbers, in disk sizes, file sizes, bandwidths, clock speeds, etc. Everything can and should be in base-10, shift-units-at-a-glance SI. And the proportion of average users to insiders keeps growing. SI will win.

For Jeff’s question about ever needing to use ‘petabytes’, Many workplaces are now dealing with petabytes of data. We have a few petabytes of spinning disks at the Internet Archive; I know commercial and big-science entities have far more.

And, regarding being “glad I won’t have to deal with saying” zetta and yotta, why so pessimistic about the progress of technology and/or your own lifespan?


#124

Sebastian: Err, yes. You’re right - 1440, not 1044 KB. Sorry about that!

Sean: There are very good reasons why RAM is going to be in powers of two - it would be quite a lot of effort to allow for 1000 megabytes of RAM on on DIMM and 1000 on another, compared to 1024 on each. (RAM is addressed by a computer on an address bus. Each line of that address bus is a bit of the address; allocating addresses to RAM dimms thus naturally falls on the boundry of an address bus line. Which translates to a power-of-two in the address space. That’s why 1 GB of RAM is always going to be 2^30; because 2*10^9 is not going to divide easily on an address bus. Doing divisions by 1000 is going to add an extra cycle or two to every RAM access plus some extra chips!) Hard drives aren’t addressed this way, so that’s why they can be sizes that aren’t otherwise ‘nice’ for computers.

Will: Modems always were wierd; mainly becaue they (usually) used 10 bit bytes. Yeah, I was aware of some confusion with communications people, but I got the impression they hardly delt with bytes anyway.

(Meanwhile, we’re skipping the octet vs byte debate? 8 bits wasn’t always standard, you know! :slight_smile:


#125

‘Aaron G’ writes: “A byte is already 8 bits, and it isn’t divisible into centibytes or millibytes, so it doesn’t even make sense as a metric unit.”

In information theoretic contexts, even bits can be fractional. And when describing very slow links, it could be meaningful to speak of such exotic and peculiar things as centibytes or centibits per second.

Contrived and weird, yes, but not totally nonsensical.


#126

I see no reason why there can’t be an option for using both kilobytes (1000 bytes) and kibibytes (1024 bytes), like the labels here that say something like “1Gal. (3.8L).” Slowly people will begin to understand the relation between the two, like how many people learn that a yard is almost a meter.

As for sounding ridiculous, that’s just ridiculous. They may sound funny, but so does the mole (mol) and the joule (J). In fact, my chemistry teacher in high school had us make a mole (the animal) for a grade!

Once again, the drives could use the metric standard and the binary standard, as in “500GB (465GiB),” allowing consumers to see the difference and keep them happier with the manufacturers because they knew the two possible measures that could be used, instead of feeling they were ripped off.

In the programming sense, using the standard 10^x is rather an annoying convention because of the nature of bits - 0 or 1. If they were to somehow come up with a 10 state bit (easily possible with quantum computers,) then I could see the warrant on using the standard metric definitions, but until then, no thanks. This difference in systems - base 2 instead of base 10 - led to the rise of other counting systems, such as octal and hexadecimal (hex). Personally, I like to count memory and the like in hex in the binary notation. In hex, this use of “strange” numbers tacked onto the end disappear. For example:

1024 Bytes = 0x00000400 Bytes = 1KiB
1024 KiByt = 0x00100000 Bytes = 1MiB
1024 MiByt = 0x40000000 Bytes = 1GiB

It also comes in handy to use the KiB notation in small systems, where you need to know exactly how much memory you have left and if it’s enough for a 4KiB image.

Oh yes, the reason we use binary measurements is because comuters use binary! Addressing for both RAM and hard-disk is done using the binary/hex system. That being the case, it makes sense to me that they use the binary versions of the prefixes, but that would confuse people. So once again, I think listing both notations on the package makes plenty of sense.

Not to mention, if a byte were a standard SI unit, then it would be made of 10 bits. Then you could have a real decibyte. But naturally, if there was such a change, all the software out there right now would wind up being pretty useless because it isn’t built for 10 bit architectures (although that can fairly quickly be remedied).

In the end, I think placing both labels on products will help get people used to the relation of a GB and a GiB. I have started to be able to tell approximate size of large files from one system to another, similarly to the conversion of yards and meters.

Anyhow, that’s what I think.


#127

I was completely unaware of these rules. This was SO helpful- cleared up a lot in my head! Thanks for this!


#128

Honestly, I think that it’d be ultimately better to use a convention like this:

16 bytes = 1 da16B = 0x10 bytes
16^2 bytes = 1 h16B = 0x100 bytes
16^3 bytes = 1 k16B = 0x1000 bytes
etc.

and the number before the unit written in hexadecimal.

The manufacturer could provide this for anyone who’d find it useful then show a decimal conversion in scientific notation (x.xx * 10^x) for general consumers who really only care about scale.


#129

The binary prefix symbols are great. The solution to not sounding like you have a speech impediment is to pronounce them kili, megi, teri (KIL-ee MEG-ee, TER-ee) etc.