Thursday, January 29, 2009

What getting it wrong means

Today was the 23rd anniversary of the Challenger disaster; January 28th 1986.

I was one of the schoolchildren that NASA had arranged to watch the challenger launch via closed circuit TV. I remember sitting there in science class, gray haired and floral printed Mrs. Burke and the kids I'd been with since kindergarten all around me.

It seemed like it took forever for the countdown, and then the engines, and the steam and smoke and it took FOOOOREVER for it to lift off; but there it went.

73 seconds...

When you're a kid, 73 seconds seems like an awful long time.

Most of the kids were already starting to turn away, bored; but I was still watching, and so was Mrs. Burke.

73 seconds...

I don't remember seeing the explosion honestly. I know I was watching, I know I saw it, I remember the emotions.. confusion, anger, fear, sorrow, more confusion... but I don't remember seeing the explosion.

What I remember most is Mrs. Burke gasping, and crying. I'd never seen a grownup outside of my own family cry in public before. and in the halls you could hear the sound of more crying. More grownups crying.

We were all sent home that day. Everyones faces looked wrong. Everyone knew that those people had died; but bigger than that, something great had been wounded badly that day.

That's what happens when engineers get it wrong.

Occasionally in my work I have been asked why I make such an effort to make sure I get everything right.

Everyone who knows me, knows that I am absolutely driven to get things right. There are a lot of reasons for that, involving my family, my ego, and just my general character; but there's also something that was absolutely ingrained in me during my education.

This question always shocks me; in that I can't imagine why anyone wouldn't try to do things right whenever possible; after all, it's your job, and any job worth doing is worth taking pride in; but I have a very specific example, and a very specific reason to explain it.

My degrees are in aerospace engineering and computer science. My aerospace engineering degree taught me to get it right no matter what it takes; because aerospace engineers can't afford to be wrong.

My degree advisor (a famous safety expert actually, who was involved in the Challenger investigation) said something to all of us that has stuck with me ever since. "When a programmer screws up, maybe a few hundred people lose some data. when an aerospace engineer screws up, a few hundred people die".

Remember Challenger? That's what happens when Aerospace Engineers get it wrong.

Challenger blew up, because some O-Rings were not quite as resilient as they should have been, because it was a little colder than planned on launch day.

It's an awfully small mistake, to cause such an awfully big problem; but that is the nature of the beast.

The engineers in charge of the o-rings were convinced by their bosses that it could be OK to go ahead with the launch, because they had designed enough safety factor in, that things wouldn't go wrong. They were told to "take off their engineer hats, and put on their manager hats"; and in their manager hats, decided that the risks were low enough, and having the launch on schedule was important enough, that they should continue.

They were obviously, tragically, wrong.

Richard Feynman (a personal hero of mine) was a part of the committee investigating the accident, and he famously said:

"For a successful technology, reality must take precedence over public relations, for nature cannot be fooled."

No, it cannot. The laws of physics do not forgive error.

There's another example that struck me from when I first read of it as a child; and has had a profound impact on me ever since; informing on everything that I do.

Do you remember what the first jetliner was?

Most people remember it as the Boeing 707, and indeed it was the first commercially successful jet airliner; but the first jetliner to enter service was in fact the Dehaviland Comet, in 1952.

The reason people don't remember the Comet as the first jetliner, is because it was withdrawn from service in 1954, after suffering five crashes in two years, killing 109 people; and wasn't returned to service until late 1958, after the Boeing 707 had already started to become the dominant airliner.

It turns out, that there were two very small errors in the design; that had very large consequences.

The first problem, was that leading edge of the engine inlets was curved a little bit too sharply, causing the engines to lose power at certain angles. This caused the first two crashes, both within a few months of entering service.

The second mistake was even smaller, but was far more serious.

The comet was not only the first jetliner; but also the first aircraft in airline service that flew as high, or in as great temperature extremes. This of course has an impact on the aircraft, which is after all made of aluminum only as thick as heavy paper.

When an aircraft is pressurized, it turns into an aluminum baloon; stretching very slightly. It contracts slightly when depressurized. While pressurized (well... at all times really, but the impact is greater while pressurized), turning, climbing, and descending; stretch, compress, and stress the aircraft in many ways.

This is reasonably well understood, and was even then; but this was a whole new category of aircraft. The only similar aircraft in existence at the time were military bombers (in fact the Comet was itself a variant of a military bomber design, the "Nimrod" which served in the RAF for over 20 years); and military bombers don't have amenities like cabin windows.

A cabin window is of course a hole in the aluminum skin of the aircraft; which as I said is being stretched tight like a baloon.

You might have noticed when flying in a modern jetliner that the windows are kind of a flattened oval shape at the top and bottom; with very gently rounded corners.

The Dehaviland Comet is the reason why... or rather physics is the reason why, but the Comet is what taught us the lesson.

The windows of the Comet had roughly square corners. In physics, square corners are often called something else: Stress Risers; because stress tends to concentrate at those points.

Over the course of a few hundred flights, pressurizing and depressurizing, small cracks would form at the square corners of the windows. Eventually these small cracks would travel along the skin, becoming large cracks; and causing the entire fuselage to fail in mid flight.

It's a very small error, caused because the designer liked the look of square windows (which previous unpressurized airliners had); and didn't take stress risers into account.

It was a very small error, that cost 109 people their lives.

That's what happens when you don't do everything humanly possible to get it right.

In my current job of course, peoples lives don't depend on me getting things right. I'm not a doctor, or an aerospace engineer, or a fighter pilot; but it's still important that I get things right, and not just for my own satisfaction.

In my position, if I get something important wrong, my company could lose millions, or even hundreds of millions of dollars. Peoples jobs are lost over such things; their lives changed greatly for the worse.

And the thing is, you never know what's going to be important. Get the corners of windows wrong, or the flexibility of a little piece of rubber at 32 degrees; and people die. The law of unintended consequences is always in play.

Remember, you can never do just one thing. No matter what you do, or try, or say; no matter what precautions you take; you cannot know all the consequences, effects and impact of your actions.

So you better get it right.