Just a Little Bit of Software History Repeating

Rick · March 31, 2008, 12:00am

I don’t think you can blame this one on software - this is more of a hardware problem. We often forget the physical side of such systems, and the fact that they will never be as precise as the software that controls them. The hardware requires constant maintenance, just to keep the error rate low. Solenoid misfire, gate bearings gum up and jam, sensors get covered in dust and give false readings… even simple maintenance, like cleaning shampoo from someone’s suitcase off the conveyors. The real-time controllers and other software are less than half the problem in these sorts of environments, so I hate to see them take the blame.

TZerb · March 31, 2008, 12:00am

I wonder how many software developers it takes to say “Let’s just try it with one airplane.”

Greg · March 31, 2008, 12:00am

I’d say this belongs in the engineering/management project failure hall of fame. Software is but one component here.

Steve · March 31, 2008, 12:00am

Perhaps the mistake was to think of it as a “computer problem” rather than in terms of strategy, physics, and logistics first.

Edward · March 31, 2008, 12:00am

Ha, great timing Jeff. I’m going flying to London tonight.
Cheers!

TomR · April 1, 2008, 12:00am

Why can’t they slowly integrate the new system? It seems there shouldn’t be a single day when they switch completely, but rather slowly start moving more and more portions of the old system to the new system, fixing problems as they go.

crtlook · April 1, 2008, 12:00am

The look is interesting, though harder to read. I wouldn’t keep it too long.

However if you are going for glory, see if you can change the cursor to an underbar.

Alex · April 1, 2008, 12:00am

Kevin: I’d presume that they didn’t close T1 when T5 opened, did they?

Yes, they did, or rather they closed the British Airways checkin there. The BA operation at T1 moved over to T5 lock, stock, and barrel.

The next step is to bring the BA long haul operation at T4 up to T5…

Andy_Brice · April 1, 2008, 12:00am

I watched a documentary about the new terminal 5 a few nights before it opened. It showed them load testing the system with the maximum number of bags (real bags).

I don’t know the full details of the failure, but I didn’t get the impressions it was a software failure (for once).

Ps/ The new black on green template is almost unreadable.

DaveGJ · April 1, 2008, 12:00am

Add Sydney (YSSY) to the list - their new baggage system was the same. Jumbos taking off for Europe without a single bag on board…

Erik27 · April 1, 2008, 12:00am

Does anyone “on the outside” really know what went wrong?
Software failure? Or also hardware failure? Human factor?
None of the responsible persons will give details now because
of legal reasons. So we can only guess (wrong) …
Testing a baggage system is costly but as someone wrote here
it had been done and what is communicated currently is that
only the big “backlog” is a problem, not the new luggage.
Is that true - we do not know (currently)

Erik

Scott_Cowan · April 2, 2008, 12:00am

It’s funny I’ve worked on baggage control systems in the past and its nothing like a web project.

the old PLCs (Programmable logic controllers) are just one big if then else.

You first design the system and install it and then you alter the software you made in simulation. You’ll run into things like how the tag printers will print tags that cant be read when they’re low on ink. We had one where we needed to add additional barcode scanners to get better read rates.

Start with a human backup, these systems usually have a manual encoding station to prevent the system from not knowing what to do with the bags.

The tweaks made to the system after its being used actually make it usable, you don’t expect a cutting edge system to actually work which is sad.

MikeD · April 2, 2008, 12:00am

@Erik: I heard a rumour, after posting my initial thoughts, that actually BAA, the airport owner, had been dry-running the system for months and had some very experienced handlers by the end of the testing.

Then they handed over to BA who fired all the people who’d been doing the testing and brought over their own staff from Terminal 1 with minimal training. Result, no-one knew what they were doing.

Dual-running was inevitably going to lead to someone losing their jobs - ultimately they’d be overstaffed - but they could have mixed teams of experienced and inexperienced people, brought more people over from T1 when the first ones were confident, and repeated the process as the other flights moved across.

I do wonder a little whether unions were involved.

Peter · April 4, 2008, 12:00am

With DIA, the engineers who had prior experience building these baggage handling systems said it would take 4 years to build such a system. The mismanagers said “the airport opens in 2 years, so we promised them it would be built when the airport opens.”

The DIA baggage handling system ended up going into operation 2 years after the airport opened. Exactly on the schedule that the engineers said it would take.

Was the DIA baggage handling system built “on time” or was it “2 years late?”

Brian · April 12, 2008, 12:00am

I worked on a consulting gig in Denver during that time period and, though it sounds like I was a statistical anomaly, I had a great experience with the automated baggage handling system. I flew in every Monday morning for about 3 months. I got off my plane, jumped on the train to the baggage terminal, rode up the long escalator, rounded the corner and I could literally stick my hand out and my bags would be just emerging onto the conveyor belt where I could grab them. The guys I traveled with thought I was full of it (which I usually am) until they went with me and saw it for themselves. I guess the sun shines on a dog’s ass every once in a while as they say, but in one man’s view it wasn’t a complete failure… On the other hand, it’s not like the Therac failure of the 1980’s, but the same underlying principles were at work…Canadians.

Jonas_Klker · February 3, 2009, 12:00am

The thing about airports is that they suck and they always will suck.

At least the jet engines will…

Jon_Limjap · February 6, 2010, 12:00am

So, there was virtually no load-testing on these things?

Fortunately for web applications, load can be simulated. But how do you simulate 15,000 pieces of luggage that each weigh 30 - 70 lbs?

Don · February 6, 2010, 12:00am

“The fiasco is rapidly turning into a national humiliation …”
"“It’s a national disgrace and a national humiliation,” [Liberal Democrat MP Alistair Carmichael] said."

Even if the airport terminal personnel are great people, you need to think about the whole process. People are part of the system too, even the customers who try to keep up with everything that is going on.

Well, they did practise the process, but there were glitches. Looks like the glitches were not fixed after practicing.

JohnF · February 6, 2010, 12:00am

The thing about software is that it is really easy to do something complicated. The bit where the engineering comes in is that it is really difficult to find all the risky edge conditions, handle failure gracefully AND scale up to do lots of even that one thing.

If we didn’t have hubris we wouldn’t do anything exciting and we wouldn’t progress.

Guy_Rixon · February 6, 2010, 12:00am

Re Mike Dimick’s comment: yes, all good risk-reduction measures, but could it actually be done? They can’t move all their staff to the test (or T4 dies). They can’t get a proper, full test with new staff (costs too much to train, and new hands may not work the system the same way as old hands). So they’d have to test with a fraction of the old staff at a fraction of capacity and hope the result scales.

I wonder: is this new system so centralized that you can’t test a small bit and scale the results? E.g., a few high-capacity conveyors rather than many small ones.