Gigabyte: Decimal vs. Binary

The 1.44 3 1/2" floppy shows the problem best

Unformatted capacity 2MB (as per hard drive so 2,000,000 bytes)

Formatted capacity = 1,474,560 Bytes
or as 1024x1024 = 1.41 MiB
or as 1000x1000 = 1.47 MB
but 1.44M as quoted is the size in units of 1000x1024!

The hardware people love this because it make their drives look bigger

The Software people (especially OS people) love this because it gives them an excuse as to why there is a large difference between the advertised size and the usable capacity, see above the advertised size is 2MB but the normal usable capacity is however you show it less than 1.5MB (There was Microsofts DMF formatted floppies which were 1,763,328 bytes (1.68 MiB) but still way off the total capacity)

I’m just waiting for the hard drive manufacturers to define a byte as 10 bits.

That’s essentially what this whole argument comes down to. Bytes are arrangements of bits, and kilobytes are arrangements of bytes. Megabytes arrangements of kilobytes, bytes, and bits; and so on. It doesn’t make any sense to arrange bytes in decimal values because they are themselves a binary representation, 2^3 bits.

We could, as programmers, abstract it all to decimal for the end user. The question is whether or not we should, just to legitimize the choice of the hard drive manufacturers (one of the very small number of groups that actually distort these values). The end users don’t care a bit until they buy a hard drive that turns out to be smaller than they thought it would be.

Another area in which this is often done is with network speeds, especially in the world of dial-up modems. Unfortunately for most people, these devices rarely deliver even the full speed promised in decimal (because of the network they’re on, usually), so it’s rarely an issue that people face with their newly purchased modem or NIC.

The worst is Broadband …

That Lovely 10Meg Broadband connection how fast is it

10MB per second …?
10MiB per second …?

No 10,000,000 Bits per second …

I don’t think it’s a “trick” that storage manufacturers use … it’s simply the well-established tradition. And besides, it is correct. Hard to fault them for being right.

I’ve been writing low-level software that deals with storage (drivers, filesystems, CD/DVD burning, etc) for about ten years now. In order to stay sane, I have become pedantic about it and always specify binary vs decimal KB/KiB, MB/MiB, GB/GiB, etc. In verbal communication I don’t bother so much, but you really have to be explicit in code and other written communications.

The only problem I have with the whole thing is that it’s not an easy 1:1 mapping. The new prefixes have the virtue of being unambiguous: when someone writes “2 GiB” it’s perfectly clear what they mean. But the old prefixes haven’t been fixed. When someone writes “2 GB” you have to consider the context and decide whether they mean 2.00 decimal GB or 2.00 GiB = ~2.15 decimal GB. Ugh.

Magnetic disks tend to store files in multiples of 512 bytes (sectors). Optical discs tend to use 2048 byte data sectors. Ignoring powers of 2, the capacity is always a multiple of 2048 or at least 512. Any disk that’s got a capacity below 1 MB is probably using 2^10 even if it says KB.

It looks like that’s about where they switch from 2^10 to 10^3: http://www.buildorbuy.org/floppydisk.html

Although I understood the difference between SI and IEC units,
this still bit me when I was creating disk images for installing
to flash (compact flash, usb keys etc.).
Even though flash is based on powers of 2 logic internally,
manafacturers use different amounts of space for wareleveling etc.
So if you use the max space available for one flash device, the
image may be too big for another flash device.
The only safe thing to do is to only use the SI space.
I.E. if you have a 64MB flash, only use 64000000 bytes of it.

I find the unix units command handy for these conversions:
units -t ‘500GB’ 'GiB’
http://www.pixelbeat.org/cmdline.html#math

jaster, The broadband situation is even worse than that.
Here’s a handy calculator so you can switch between MB/s Mib/s …:
http://www.pixelbeat.org/speeds.html

When I was 10, I remember my dad trying to explain to me the relative capacity of 20 MEGABYTE hard drive he got in a new computer. I asked him if it was possible to ever fill up that much space on a hard drive. He said that, practically, it was not possible.

Then when I went to high school, a friend of mine had a father with an Audio/Video production facility, and he told me he had an external 1 GB drive (he pronounced it “Jigga-byte”). I nearly fainted at the sheer magnitude of drive space.

Given all the confusion here in the comments section of this blog, I think making a distinction between kilo (=1000) and kibi (= 1024) is very useful.

In our compnay, wehave been doing this for several years and it saved us from a few embarrassing errors in our software.

For those that do not like the word ‘kibi’: now that is what I call a relevant argument. Get over it.

@Verizoth

If they could, hard-drive manufacturers would be much more likely to redefine a byte as 5 bits. This gives a bigger number on the front of the box cynicism

19"(17.4" Viewable), 500GB (465GB Usable).

Ah yes, excellent example.

The discrepancy on monitor size only existed for CRTs, because the CRT tube itself was always partially obscured by the bezel of the monitor, and not all the tube could be used for display phosphors anyway. So a 19" CRT tube ends up with 17.4" of viewable space after you factored this stuff in.

Now that we’ve all pretty much switched to LCDs, this is a moot point. LCD monitors don’t use tubes; every inch of the flat panel (well, probably 99% of it) is filled with RGB elements visible from edge to edge. Thus, a 19" LCD is by definition, a 19" viewable LCD. :slight_smile:

@Jaster
Wrong answer!
10 megabits/s = 10 485 760 bits/s

It seems totally absurd that to redefine well established units of measurement in order to create metric units. This is like saying that from now on, a mile will be 1000 yards: get over it.

A more appropriate course of action would have been to define new metric units of measurement.

kidebytes?

I remember when drives were defined using power-of-2 designations. The manufacturers definitely changed at some point. Problem is, if one changes, they all have to change, or their drives look smaller in comparison.

The problem, for those who ask, is that computer memory is always defined with the power-of-2 system. So, there’s a mismatch. Back when disk drives were close to the size of RAM in your machine it mattered more, I guess.

Thanks for the info, Jeff. Hadn’t seen the SI power-of-2 system before. Think I’ll start using it just to confuse everybody I know.

Besides, a company that sold good CRT monitors rarely made it even remotely hard (they usually printed it on the box, though generally in smaller print) to find out what the viewable area is on the monitor.

Half the time you have a pretty hard time finding out what the actual formatted size of the hard drive is before you put it in your computer and format it (even if you’re using a very common file system).

In my personal use, I rarely run into issues with this sort of thing, except when I have to explain to someone why their new 500GB hard drive can’t actually hold 500GB (usually just explaining the powers of 10 vs 2 is enough for them without going into gritty details, since Windows still displays drive space in powers of 2 (ie my primary partition is listed as 89,589,747,712 bytes - 83.4 GB in the drive properties dialog)).

Hey Now Jeff,
I’m so glad I read this post, I always wondered why there was space missing. Now I know the reason why the drives show up as less space. I discovered your great blog though a shrinkster link on a DNR show not googol.
Thx,
Catto

@Tema

No, he is correct. Network speeds have always been in bits per second, and have never used powers of two. Your old 28.8k modem was 28800 bits per second if it could negotiate that rate over a potentially noisy phone line. And those bits were the signaling speed of the line, of which there was usually framing and error protection and detection overhead. Even with RS-232 signaling for example at 9600 you have an overhead for a start bit and a stop bit (assuming 8 bits per word, no parity, 1 stop bit).

Personally, I think software should start to standardize on using the SI meanings to display sizes of things (file sizes, drive sizes, download speeds, et cetera). Those things are almost never an even power of two. The only place it doesn’t make sense to use that notation is total size of memory that is inherently a power of two because of how the hardware is made (e.g., CPU cache, system memory). That’s the only place I can think of you’d need some kind of qualifier on the spec sheet or on the package.

I can handle the drive-capacity issue, but I wonder why we don’t use the SI prefixes for RAM?

Great, now we have to deal with imperial metric measurements. d’oh!

In the longer term having a measurement which is 1024 instead of 1000 will seem as silly as having 12 inches in a foot or 3 feet in a yard.

The simplest solution that I can see is that everyone switches to K=1000. Your average Joe wouldn’t notice the difference.

-Andrew

Sean: why list file sizes in KiB, rather than KB? Because even though hard disk manufacturers have squatted on the traditional descriptions of size like dogs in mangers, hard disks are still naturally sized in 512-byte units. So all file sizes are rounded up to multiples of at least 2^9 - if not more, when one considers clusters of blocks.

I think that’s what galls me the most - hard drive manufacturers aren’t even using the best units for their products in their keenness to pull a fast one on their customers.

Meanwhile, Sean, have you noticed that you’re the only person mounting a strident defence of the new way of doing things - almost to the point of telling anyone over 25 that they’re brain-damaged…? Methinks thou dost protest too much.

As for the question of Mbits/s, once upon a time there was a unit that naturally encapsulated the “bits per second” measurement; it was called Baud. I remember 300 baud modems; somewhere around the 14.4 era, Kbaud (which as has correctly been stated, was always a decimal measure, having long predated the era of binary computers) suddenly became Kbps. If one were to refer to “gigabit Ethernet” gigabaud instead, the confusion goes away. (No, it’s a complex unit - so is the volt, but nobody talks about joules per coulomb.)

So I suggest that the status quo is just fine:

A megabyte is 2^20 bytes, the natural measurement for memory.
A megabyte per second is the natural measurement for data transfer across parallel buses.
A megabaud is 10^6 bits per second, the natural measurement for data transfer across serial lines.

There. What’s the problem? What’s ambiguous about it? Why should we change the way things have been done for decades just because hard drive manufacturers are greedy? (They’ve always been greedy. Remember “unformatted capacity”, anyone?)