Why Do Computers Suck at Math?

Lothar · May 14, 2009, 12:00am

Interestingly, the launch failure of the Ariane 5 rocket, which
exploded 37 seconds after liftoff on June 4, 1996, occurred because
of a software error that resulted from converting a 64-bit floating
point number to a 16-bit integer. The value of the floating point
number happened to be larger than could be represented by a 16-bit
integer. The overflow wasn’t handled properly, and in response, the
computer cleared its memory. The memory dump was interpreted by the
rocket as instructions to its rocket nozzles, and an explosion
resulted.

As Vincent alreay pointed out, this is nonsense. What happened was that code from the Ariane 4 project was reused. The special thing about that was that it was proven mathematically that the input-values will never exceed the range, so that there was no need to check these boundaries.

Ariane 5 - that is much bigger than Ariane 4 - lead to the values exceeding this range resulting to an exception when executing the ADA-code and the shutdown of the specific computer. The secondary system taking over and processing the same data, shut down as well after running into the same exception leaving Ariane 5 without working stabilization system. The explosion was remotely initiated because of the rocket starting to become uncontrollable.

Hope that clears things.

Regards, Lothar

Philip · May 14, 2009, 12:00am

Mathcad - by PTC - get’s all these values right every time. I suspect Matlab, Maple and others do too.

That’s why I use applications specifically designed for calculation when accuracy is required. Excel is a spreadsheet, and while powerful it doesn’t doesn’t have the riggor of a dedicated mathematics engine nor does it have a symbolic engine.

I love that joke by Paul.

Kibbee · May 14, 2009, 12:00am

I think that a lot of the problem comes from 2 places.

Trying to fit such a large number of values into such a small space.
Using binary floating point rather than decimal floating point.

Firstly, a standard 32 bit floating point can represent a value as large as 3x10^38, and as small as -1x10^45. That’s a really big number. Most applications, unless you’re dealing with astronomy, or particle physics you will never need numbers that big. 99% of people probably would be good with a smaller range of numbers if they were more accurate. For the numbers that most computers deal with on a daily basis, 10^16 would probably be more than enough.

Then there’s the issue of using binary floating point. This means that we can’t even represent numbers such as 0.1. There’s a lot of common numbers that can’t be represented exactly. If we used a representation of the numbers that more closely mapped to the numbering system we used, then there wouldn’t be so much of a problem.

So, I think that we should really be using something more like what is offered in databases in our programming languages. Something like the Decimal DataType. You can declare a field as Decimal(18,4), and you know that you can represent all numbers that are up to 14 digits before the decimal, and 4 digits after the decimal. You know exactly which numbers you can represent, and any person can understand that without getting into the complexities of converting from binary representations to decimal representations.

Tim · May 14, 2009, 12:00am

0.2 (or any non integer multiple) is the equalavent in binary as 0.3333333 in base 10.

This causes a ton of problems and is by any loop should use = to stop the loop and not ==.

Also, about .999999999 being equal to 1, consider this.

1/9 = .1111111111…
2/9 = .2222222222…
3/9 = .3333333333…
4/9 = .4444444444…
5/9 = .5555555555…
6/9 = .6666666666…
7/9 = .7777777777…
8/9 = .8888888888…
9/9 = .9999999999… (WAIT 9/9 = 1 doesn’t it?)

Zack · May 14, 2009, 12:00am

Tim’s post is compelling.

AC19 · May 14, 2009, 12:00am

thinks some chum has been thrown in the water as he watches the sharks circle

BobbyCannon · May 14, 2009, 12:00am

[c#]
const float nine = 9;

for (float i = 1; i 10; i++)
{
Console.WriteLine({0}/9 = {1}, i, i / nine);
}

Console.ReadLine();
[/c#]

This outputs:
1/9 = 0.1111111
2/9 = 0.2222222
3/9 = 0.3333333
4/9 = 0.4444444
5/9 = 0.5555556
6/9 = 0.6666667
7/9 = 0.7777778
8/9 = 0.8888889
9/9 = 1

Brandon · May 14, 2009, 12:00am

I have had this issue. As a researching scientist I find errors in math all over the place. My TI-89 is one of the worst offenders. Labatory results can have huge errors because of this

tim31 · May 14, 2009, 12:00am

This may all be news to those who did not have a computer science education.

Sharks are jumping all over the place.

I also agree that (once again) Jeff misunderstands or over-simplifies - i.e. the Ariane comment.

Tom · May 14, 2009, 12:00am

It can only be attributable… to human error.

Brian_Tung · May 14, 2009, 12:00am

No proof that 0.9999… = 1 is needed because they are equivalent by definition of the reals.

Of course this doesn’t help most people upon first encountering this conundrum, but it’s a reflection of the extent to which mathematics is a creation of humans. (Viz. the famous quote by Kronecker.) Perhaps a limit on the neo-Platonist view of mathematics.

PRMan · May 14, 2009, 12:00am

You FAIL Jeff.

The correct question to ask the Windows Calculator is what is the difference between Windows 3.11 and 3.1?

The answer: nothing.

Dejan · May 14, 2009, 12:00am

Your post reminds of a joke that I heard not long ago…
An infinite number of mathematicians walk into a bar. The first one orders a beer. The second orders half a beer. The third, a quarter of a beer. The bartender says You’re all idiots, and pours two beers.

sharms · May 14, 2009, 12:00am

@Darren - For most coders we have had just as much math as traditional mathematicians, and I am not sure where your school is but when I attended the University of Michigan it was taught that .999… infinitely repeating is exactly equal to one.

http://answers.yahoo.com/question/index?qid=20070821213704AASMKu0

Jonathan · May 14, 2009, 12:00am

I really don’t think it has anything to do with the programmers sucking at math, it has to do with them sucking at testing. No one is going to anticipate that the computer will give a mathematical error on any particular line of code. Rather, we need to anticipate that it’s going to do something unexpected SOMEWHERE due to a bug in our code or in the runtime we’re using. The crime isn’t that the NASA programmers didn’t realize the loss of precision with the numeric conversion, it’s that they failed to create a unit test approximating real world data that would have revealed the problem before a billion dollar spacecraft blew up.

CraigF · May 14, 2009, 12:00am

Hmm, doesn’t seem anyone’s covered this angle yet… I think the reason there isn’t a widespread solution to this problem built in to every programming language yet (as someone else said, programmers aren’t clamoring for it) is because the vast majority of programmers don’t know this problem exists. If more people knew about it, maybe they would be. Imagine if it were common knowledge among business executives that any result coming out of a spreadsheet is suspect, depending on what numbers you used. It’s a dirty little secret amongst us programmers.

It is after all, ridiculous that we as people are more accurate in storing numbers with large precessions than computers are. How do we do it? We use symbols to denote repeating, and we articulate this as fractions when a division doesn’t yield a zero remainder. So when we divide 9 by 2 for example, we as people might write: 4 1/2. Simple, a string of 5 characters including the space. No precision problems. Why in this day age we feel we must represent this in base 10 or base 2 is beyond me.

Now before you say that strings are slow, yes, today they are. But we’re starting to see specialized processors in computers. Sound components, dedicated to that purpose. GPU’s, dedicated for visuals. Why can a computer not have a better Math processing unit to make it fast?

Math errors like this are just unacceptable to me. What it comes down to is that you can’t just trust the computer - because for your average programmer to just be able to write code under all the pressures we deal with every day, we just shouldn’t have to deal with such an obscure problem that can manifest itself in such scary ways, when we least expect it. I wonder how many software bugs (yet to be reproduced or fixed) are as a result of some stupid math error buried so far down the abstraction chain that our puny human brains have zero chance of eradicating?

One last point that scares me about this weakness of computers: it’s not that slightly less precision will hurt us very often. Let’s face it, infinite precision doesn’t make something more practical, necessarily. The bigger problem is in equality tests, which are used for logic switches. What if I press the button (you know which button I’m talking about?) based upon whether A - B = 0 ? (and A or B are one of those wonky values mentioned above)

Big trouble. It’s time EVERY language had a true Number class. If we need to bake it into the hardware to get us there, fine, do it!

Steve · May 14, 2009, 12:00am

There is nothing wrong with math.

Computer hardware is binary. Software tries to cope.

Kerry · May 14, 2009, 12:00am

What scares me is when people code money values as floating-point - I’ve even had an argument with a consultant running a Java course for Sun over this.

MichaelR · May 14, 2009, 12:00am

Your audience is software developers.

Why are you explaining this?

AdamR · May 14, 2009, 12:00am

Jeff, did you even READ the article you linked to about the Excel bug? It has NOTHING to do with floating-point error, it’s entirely a display bug. The result is computed correctly.

The problem is in the float-to-string conversion code, which was hand-written assembly for maximal speed. The bug appeared when porting it from Win16 to Win32.

With 16-bit instructions, when the AX register goes from 65535 to 65536, it overflows back to 0 and sets the x86’s overflow flag (OF) in the FLAGS register. With 32-bit instructions, when the EAX register goes from 65535 to 65536, it does NOT overflow and does NOT set the OF flag in the EFLAGS register.

For some very particular floating-point inputs, the float-to-string function increments the AX/EAX register from 65535 to 65536/0. As a result, a branch that was taken in 16-bit code which predicates on the OF flag was no longer being taken in the 32-bit code, eventually resulting in the bug.