Just a Little Bit of Software History Repeating

“Computer systems that have to interact with a physical entities in the physical world have complexity caused by the unpredictability of the stuff that they deal with.”

This is so true - I work on a Warehouse Management System that tracks stock in, around, and out of a warehouse. There just no way to code around what the users physically do with an item. If there’s a db timeout, I can tell them to put it back, but it they don’t read the screen, or put it back in the wrong place, things can get messy.

Humans are, by their very nature, not like computers. They don’t think like them and their ability to learn is what makes them different. People usually feel constrained when working with computer systems/terminals because they almost always know a “better/faster” way to accomplish things. People have to change how they work to accommodate the machine, whereas really the machine should facilitate the desires of the people.

Tough stuff.

“Blame that solely on the well-greased politicians, not on a “software project failure”. Cut the programmer community some slack, Jeff. :)”

The post isn’t necessarily pointed at the programmers, I wouldn’t say (Jeff can correct me if I’m wrong). The point of the post as I read it is that the designers and producers of a system failed to apply the rules while they were doing so, and that this is a scenario often paralleled in the software development world.

You see it all over, it’s just that Heathrow’s T5 shambles is big(ish) news.

you need to test the system with the people who are actually going to use it

And just how were the bags in T1 supposed to be handled whilst everyone was training over in T5? You can’t make baggage handlers pull a double shift just to learn the new systems (they’d strike just at the mere mention of it). That’s the problem with Mike’s analogy too - the actors aren’t still performing their previous show at the same time as dress rehearsals for the new one - they’ve got time to adjust.

I don’t see any way that full-scale testing could be done in this situation.

It always amazes me how people refuse to learn from the mistakes of others. Automated sorting and delivery systems are nothing new. They have been around for years. FEDEX and UPS do it daily. I even worked on similar systems for newspapers.

Any system deployment for complex problem can degenerate into this mess. I hope they let the World know what really went wrong, so we all can learn. I read that “The problems appear to be due to a combination of factors” (BBC). While this is pretty obvious for a complex system, I still don’t understand why they jump-start the new terminal to such a big load at the very first day. They even persisted in doing so for a few days, and only stopped when the undelivered baggage has reached 15,000. Why not increase the load more gradually? Handling just a couple of flight during the morning of the first day should be good enough to catch many errors. Why is the rush? If UK is not too dissimilar to other parts of the World, I suspect downward pressure from high-level management, and upward pressure from natural limitation of complex system. I sympathize with all the sleepless engineers and technicians in the middle.

Another problem with Denver’s airport (and possibly Heathrow as well) is that people were actively sabotaging the machines. Why? Because if it worked, the machine would have put many baggage handlers out of work.

People do respond to incentives.

Speaking to the earlier point about theatre, as someone with actually more of a background in that than in technology…

I often say that being a theatre practitioner has given me a good appreciation for deadlines: Deadlines in theatre are absolute. You absolutely cannot go on stage on opening night and tell people oh we’re sorry, we aren’t ready yet, please come back in a week or so. The day you say you’re going to open is the day the curtain opens at 8:00 PM.

However, there are two major differences to keep in mind between a show and, for example, a baggage handling facility (or any large, complex technology system like the ones I’ve worked on):

For one thing, a whole lot of the process of doing a show is human, and those humans are very used to thinking on their feet and dealing with things on the fly when they get messed up. I’ve seen people skip whole acts in the script, fail to bring out a gun that is used to shoot the person that the murder mystery is based on, break limbs onstage, whatever, and the show still goes on. Just another story to tell in the pub. (Lesson: Build enough slack in the process that things can go wrong and you can still complete the task)

The other thing is that with very few exceptions, the audience doesn’t care because they are on the one side not very observant, on the other side they don’t know how you planned on doing it to begin with so they don’t notice if things are different from plans, and on the third side, if the rest of the show is good, they are extremely forgiving of a flub in act one. (Lesson: make everything as slick as possible, so that when things inevitably go wrong, people are ok because your the rest of the process makes up for it)

In Madrid we recently experimented the disaster of Barajas T4. For weeks, maybe months it was the chaos. It seems that it mostly works now, though. I’ve read that a spanish company is behind the London system, so I would be cautious because: if it’s the same system that was implemented in Madrid, then it has been tested… somehow :slight_smile: and it will eventually work :smiley: If I understood it correctly, Denver had to be dismantled.

The thing about airports is that they suck and they always will suck.

Is baggage handling systems an eternal curse?

Not if they’re done right. I used to work for a company that had a division that did airport baggage handling systems, run by (depending on your point of view) either an extremely careful or an extremely anal-retentive project lead. His team spent more than six months not writing a single line of code but building a complete architectural blueprint for the system and simulating it in full detail. It was almost like watching a software engineering textbook come to life (although the developers didn’t like it much because they couldn’t leap in and hack out code).

Their work came in on time and under budget, and worked perfectly at another international airport at about the same time when Denver was melting down. The subsequent flood of orders from other airports almost overwhelmed them (I think they went through a complex series of mergers with other companies, I’m not sure of the quality of their current product). It’s a pity it’s not documented anywhere since it’d make a (rare) example of someone doing what you read about in SE textbooks but rarely see in practice.

I’ve always tell my clients to start small, but think big. You introduce functionality bit-by-bit, but with the idea that you can include new functionality and features as you work out bugs and unforeseen glitches.

FedEx and UPS do run major airport facilities that processes packages that is much more complex than anything any airport handles. They do it efficiently and quickly. Everyday, hundreds of planes fly to Memphis to the FedEx facility where over a half million packages each hour are taken in one end, processed and spewed out the other to hundreds of outgoing flights. It is an amazing site. However, both UPS and FedEx started off on much smaller scales. 100 years ago, UPS was just a local delivery company delivering store packages. They added functionality and complexity a bit at a time until their sorting facilities can do some amazing stuff.

The problem both Denver and London had was the idea that they could put everything together in one big bang. And, when that didn’t work, they had to scrap everything. What would have happened if they put together a new automated system piece by piece? Maybe the system could first handle inter-airline transfers, then once everything gets worked out, add handling airline-by-airline until the whole system works.

Whenever I hear of a major project that will change everything and be implemented all at once on a grand scale, I tell myself there’s a project doomed for failure. Remember that the Internet itself started out as a network of only 36 nodes and there was no such thing as email, webpages, or blogs. You start out small, get the basics working, and then scale up.

Ben: “And just how were the bags in T1 supposed to be handled whilst everyone was training over in T5?”

I’m not familiar enough with the problem to know, but I’d presume that they didn’t close T1 when T5 opened, did they? So therefore they had handlers at both terminals simultaneously.

Why couldn’t this have been done for testing as well?

It’s funny:

The customers who participated in the trials it said it was flakey. The staff who worked on it said it needed more work.

The software provider are based in Canada (when there are literally thousands of competent local software companies in that area). This software is used in a different context by a friend of mine who has zero confidence in it - a little hunting on the net could have probably found this out.

The management went live anyway because they didn’t believe their punters or their staff.

Sound familiar?

Pointy-haired wigs for anyone?

“But how do you simulate 15,000 pieces of luggage that each weigh 30 - 70 lbs?”

Easy, just use the huge amounts of baggage that are already sitting in a warehouse at Heathrow because of previous baggage handling failures, making it impossible to identify their owners.

It’s hilarious - they’re even asking people flying from T5 to travel without any luggage “if possible”… Yeah, like we’re going to fly from London to Sydney for a couple of weeks without any luggage…

Thanks for mentioning Denver in '94 Jeff - as a UK citizen I’m well aware of the T5 problem, but had no idea that there was a previous similar problem elsewhere… should make for some interesting conversations!

Make Small Changes.

Always.

Big changes are a very bad idea. That’s why Agile methods are good. That’s why big code rewrites are bad. It’s why T5 broke, why Google doesn’t do big product launches, why Vista bombed, why the internet never fell over, why Britain does OK with an ancient and constitutionless legal system. Organic growth is by far the safest. Why?

The Law of Unintended Consequences always wins.

Smaller leap, less chance of falling foul of it. I like this about Agile - the iterations are smaller, the bugs easier to find and more recently created.

I recall one report from the Denver baggage fiasco, where an expert said the amazing thing wasn’t that it didn’t work; it was that the people who designed ever thought it could work. We make models of the problem, and if it’s big and complex we make a model that’s simpler than reality. If our model doesn’t get updated until the project rolls out, we’re dead.

Root cause of the problem is you will always have idiots sitting in the pilot seat making all the critical decisions.

Someone from OPs should have been fired - if they can’t test a live system prior to going live when PEOPLE depend on it, deserves to be fired.

Kashif

“It’s hilarious - they’re even asking people flying from T5 to travel without any luggage “if possible”… Yeah, like we’re going to fly from London to Sydney for a couple of weeks without any luggage…”

RWW, I think David came up with an answer:

"FedEx and UPS do run major airport facilities that processes packages that is much more complex than anything any airport handles. "

FedEx your bags to Australia.

“It always amazes me how people refuse to learn from the mistakes of others. Automated sorting and delivery systems are nothing new. They have been around for years. FEDEX and UPS do it daily. I even worked on similar systems for newspapers.”

The thing is with FEDEX and UPS is that most of the items they ship have a similar design. They have 6 flat sides with the information on one of the two sides with the large space or they are big envelope. no matter what you have a flat surface to read.
Just think of the last time you saw baggage at an airport and all the shapes and sizes that people used. there is no set shape of items and even then there is no known location where the tag will be located. Then with the suitcase you have the tag on the handle which can be twisted around, folded, etc which make it hard to read.