Gigabyte: Decimal vs. Binary

Memory sizes weren’t originally binary, they were originally decimal, just like hard disk sizes. 8,000 digits of RAM meant 8,000 decimal digits, and if that’s what you had, then 8K wasn’t even an approximation, it was exact. 20,000,000 digits of disk storage meant 20,000,000 decimal digits, and if that’s what you had, then 20M wasn’t even an approximation, it was exact.

Of course you can fit more information into memory by storing values in pure binary instead of in BCD. If you have 4 bits plus parity bit plus some other stuff, and only store digits 0 to 9 plus some other stuff, you’re wasting resources. Also computations in BCD are far slower than in pure binary. But it was easier for customers and programmers to use decimal, so computer manufacturers delivered BCD.

Then computer manufacturers decided that customers and programmers could understand binary well enough, so in order to maximize storage capacity and speed with the same amount of resources, they started delivering pure binary instead of BCD.

Well, it looks like they were wrong. Customers don’t understand binary well enough. Even Jeff Atwood gets confused. Computer manufacturers should have stuck with BCD.

Rob wrote:
Yeah, but then you get to the next Monty Python question - are those 30
hours MPEG-2, DivX, DVD-quality, HD-quality at 720, 1080, or…? Same
with the “songs” metric for iPods - bitrate isn’t taken into account,
just the default length size of a pop song.

Well, usually, it is MPEG-II in movies (since it is the most common format for consumers), and 128kbit/s VBR MP3 (most common, again) for music.

But yes, it still needs qualifiers. Still, it is more “life-like” in a “What can I do with so much space?” kind of way, than the dry, scientific, and very-much-geek-friendly state of bytes.

In a way, it is a usability thing: It puts something in relation to something else familiar. Which is, of course, error prone. No silver bullet, either.

Why do I get the feeling you wrote this entire post to deliver the pun at the end?

From my understanding IEC prefixes were developed by Mushmouth formerly of Fat Albert and the Cosby kids during his brief employment at the Institute of Electrical and Electronics Engineers.

Baud is not bit per second, it is symbol per second. That is something completely different if symbols consist of more than one bit.

The confusion is even larger than I thought it was. Our job as developers is to be precise in our specifications, designs and code, so we have to change something. To bad so many here are not willing to see that there is a problem, or expect that the rest of the world should change.

Shannon wrote: “Honestly, it never used to be a problem. SI prefixes have always been powers of two for binary quantities (which is only bytes) and powers of ten for decimal quantities.”

I don’t think so. If I’m not mistaken, kilobytes/second has always meant 1000 bytes/second, when referring to modem transfer rates.

This issue is that the use of the kilo/mega/giga prefixes is ambiguous, even if followed by the word “byte”. For anyone who thinks this whole discussion is stupid, then you won’t mind if I borrow $1024 from you (a kilobuck), and eventually pay you back $1000 (a kilobuck)?

I work in software development with some very smart people. If I had a file 5,323,123 bytes in size, and someone asked me “How big is that file?”, I would be forced to answer “about 5.3 megabytes”. You know why? Because, in an informal discussion, they would understand “5.3 megabytes” to mean “about 5,300,000 bytes”. If I had bothered to do the math and answered (more correctly) “about 5.08 megabytes”, pretty much no one would think I meant “about 5.08 * 1024 * 1024 bytes”. Binary calculations may be easy for computers - not so for human beings. We all know that the proper convention for kilo/mega/gigabyte is binary, but in practice, nobody is shy about using those prefixes in the decimal sense. And nobody says “5.3 million bytes”, although you’d think that’d be a reasonably unambiguous alternative.

For those who still say it’s not ambiguous - to the average customer it is, for all practical purposes, if they have to remember/understand that:

  • Windows/Linux will report memory/file size in binary units
  • Hard drive sizes are specified in decimal units
    = DVD sizes are specified in decimal units (a 4.7 GB DVD packed full of data will show up as ~4.38 GB in your favourite OS)

And just try explaining to the average person why this is so. I’ve seen plenty of people asking the following question on various PC/gaming/tech forums:
“I just bought a 250 GB hard drive. How come it only shows up as 232 GB (or whatever) in Windows?”

Among the misleading answers I’ve seen:
“Windows ‘lies’ to you about the disk space”
“The hard drive manufacturer ‘lies’ to you about the drive size”
“250 GB is the ‘unformatted’ capacity. After you format the drive, you only have 232 GB left” (*)

Just because the industry has been doing the wrong thing for decades, doesn’t mean it’s correct or user-friendly to continue doing so.

(*) You may be laughing about how bogus this explanation sounds, but I’ve heard it from professional IT managers. If someone in the industry cannot be bothered to know/understand that GB has 2 meanings, good luck explaining that to the average joe on the street. This is by no means a criticism of them - it is actually an indictment of the tech industry. It’s no wonder that “techies” have a rep for poor communication skills, since we feel the need to redefine well-known prefixes with ambiguous meanings.

They should label the HDDs like monitors. 19"(17.4" Viewable), 500GB (465GB Usable).

I’m not sure a field that has wholeheartedly embraced silly made-up unit names (byte, nybble), in-jokey recursive acronyms (GNU, LAME), stupid, obfuscatory acronyms (PCMCIA), and the brain-melting stupidity of deliberately meaningless non-acronym acronyms (NT, XP, .NET) has any standing to complain that kibibytes and mebibytes “sound ridiculous.” We’re already soaking in ridiculous. Pissing on “mebi” et al is like pissing into an ocean of piss.

While it may be natural to use ordinary decimal measurements for storage media like optical and hard disks, solid-state storage like RAM and flash are likely to come in powers-of-two-sized chunks for the foreseeable future. That particular opportunity for confusion isn’t likely to be cleared up entirely by some simple policy change.

How is NT a “non-acronym acronym” ? Looks like a regular acronym to me. It even makes sense, which is something I can’t say about every acronym.

As any sys-admin from the 90’s knows, NT is an acronym for “Nice Try”.

Anyone with a programmig history going back at least to the 16-bit machines is likely to
think something like:

1 byte = 8 bits
1 KB = 1024 bytes
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB

If GB is 10^9 bytes for hard drives, then consistency dictates the same for memory. So your memory would actually come in chunks of 1.073741824GB.

“You need to upgrade this machine to at least 2.147483648GB of memory”

For programmers it makes sense with a notation that uses the 1024 bytes = KB shorthand if you do anything related to memory.

It’s just a sad day when it got hijacked by the SI standards crowd.

It’s not like there are things like centibytes or anything. Or maybe we should create a “decibyte” which then would be a practical 1/10th of a byte, or in other words 0.8 bits.

Yeah. Now we’re talking.

Nibbles, here we come.

Andrew Russell This is why Linux won’t be a consumer operating
Andrew Russell system - because it has the attitude that this
Andrew Russell whole messy rename thing should be exposed to
Andrew Russell end-users… I’d say it’s a form of elitism.

Why are you saying this like it’s a bad thing? We have a consumer operating system, from Microsoft, and it’s a poor product for a premium price. And exposing end-users to knowledge? Oh, no, where will this horror end!

We should expect people to raise their standards, not lower ours. Elitism is good. When did striving to be better informed and more capable become some kind of insult?

‘They should label the HDDs like monitors. 19"(17.4" Viewable), 500GB (465GB Usable).’

That isn’t the best analogy.

A) For the 2 reported measurements of the CRT monitor, there’s a discrepancy in what’s being measured (‘tube size’ versus ‘viewable size’), but the unit of measurement (inches) is the same.

B) For the 2 reported measurements of the hard drive, what’s being measured (storage space) is the same, but the unit of measurement (decimal GB vs. binary GB) is different.

To say a hard drive has 500 GB (465 GB usable) of space is misleading to say the least. I could just as easily turn around and say that 500 GB are usable, as long as you define 1 GB = 1,000,000,000 bytes. And as we all know, that is exactly what hard drive manufacturers do.

J. Stoever: Microsoft originally said NT meant “New Technology.” As opposed to that Old Technology they usually push, I guess. Later Microsoft divorced itself from that acronym, insisting that NT didn’t stand for anything and adopting it as a simple product-line name. The Windows 2000 boot screen even says “built with NT technology,” which clearly makes no sense if NT is an acronym.

Is there any other example of an ISO standard that redefines accepted ussage and makes up something totally new to replace it? That strikes me as exactly what standards bodies are not supposed to do.

Jeff, I think we ought to just forget about making the power-of-two-types use the silly IEC names. It’s just not going to happen.

Alternatively, we could go on writing “megabyte”, but use the abbreviation “MiB”. I wouldn’t mind using the abbreviations, but I’m just not going to say “meh-buh-bite”. It’s just silly.

Who can name the bigger number ?

Just a follow-up with Jeff’s disgression about really big numbers : some of you might have heard about a family of numbers, called “Busy Beavers” (they were introduced by Rado in 1962).

I think these numbers still hold the record for the biggest number series ever imagined, and the way they are defined - which has a lot to do with computer theory - is fascinating :

Imagine the simplest a href=http://en.wikipedia.org/wiki/Turing_machine#Informal_descriptionsimple turing machine/a, which would read its instructions on a tape.

Then, for a given N, feed this machine with all possible programs which can be coded with N instructions on the tape. Out of these programs, some will never end, and some will halt at some point. Out of those which halt at some point, let’s consider the one which halts after the longest number of steps (which you could see as processor cycles) : this particular program is called the “Busy Beaver”, and we can then define BusyBeaverB(N) as the numbers of steps it takes before it halts.

So, how big is BusyBeaver(N) (or BB(N) )?
As a matter of fact, it is big, very big : the values are known of BB for N=1…4, then the other values (for N=5 and higher) are still unknown and may well be out of reach of any human brain and computer processors. It was proved that BB(5) is higher than 8,690,333,381,690,951 but it might well be much greater.

Now, suppose that one particularily guilted programmer writes a complex mathematical library that can handle any large numbers and advanced arithmetical operators (such as the a href=http://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notationKnuth’s up-arrow notationa). Suppose then that this programmer uses this library to the best possible program that would output the largest number one ever thought of. To be fair, let’s also accept that the execution of this program might require billions of years or more.

Turing proved that his result will be far (by galactical dimensions) below the Busy Beaver values : as a matter of fact BB(N) beats any other computable number!

For those who are interested, here is a recommended link for more information about those numbers : a href=http://www.scottaaronson.com/writings/bignumbers.htmlhttp://www.scottaaronson.com/writings/bignumbers.html/a

And an extract which explains this in details :

Turing proved that this problem, called the Halting Problem, is unsolvable by Turing machines. The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction. (That’s the sort of thing you don’t say in a research paper.)

Permalink for that Ned Batchelder post: http://www.nedbatchelder.com/blog/200709.html#e20070909T081225

although “Yotta getta life” dude

Sean: They’re not really exposed to it - simply because “that drive has 120 and that drive 160”. 160 of what? they don’t really know - except one can hold more illegally downloaded movies.

Frankly, users should be seeing size expressed as either “30 hours of video” so they can understand it, or the commonly accepted industry standard (which, sadly, is decimal for HDDs and binary for everything else) so products can be fairly compared (because you just know that one vendor’s hour of video will be at a different bitrate to another’s).

It’s actually unfortunate that end-users could end up having to also deal with this messy “rename”, along with software developers.

And if I may trigger a linux-Windows flamewar for a moment: This is why Linux won’t be a consumer operating system - because it has the attitude that this whole messy rename thing should be exposed to end-users. The same applies to Wikipedia. I’d say it’s a form of elitism.