Sunday, November 24, 2013

Engineering, Government Style

I worked for most of my career at Hewlett Packard, back when Bill & Dave still ran the company (they must be spinning in their graves). Both Bill Hewlett and Dave Packard were engineers, Stanford grads, who started up in a one-car garage. Bill Hewlett was the consummate engineer, and he gave the following advice for proper project management: design it; test it; fix it; keep fixing it until it works. That advice, taken along with regression testing, beta site testing, and more testing, leads to decent products upon introduction.

Contrast that attitude with government engineering:
* * *
For the first couple of weeks after the launch, I assumed any difficulties in the Federal insurance market were caused by unexpected early interest, and that once the initial crush ebbed, all would be well. The sinking feeling that all would not be well started with this disillusioning paragraph about what had happened when a staff member at the Centers for Medicare & Medicaid Services, the department responsible for Healthcare.gov, warned about difficulties with the site back in March. In response, his superiors told him…

[...] in effect, that failure was not an option, according to people who have spoken with him. Nor was rolling out the system in stages or on a smaller scale, as companies like Google typically do so that problems can more easily and quietly be fixed. Former government officials say the White House, which was calling the shots, feared that any backtracking would further embolden Republican critics who were trying to repeal the health care law.

The idea that “failure is not an option” is a fantasy version of how non-engineers should motivate engineers. That sentiment was invented by a screenwriter, riffing on an after-the-fact observation about Apollo 13; no one said it at the time. (If you ever say it, wash your mouth out with soap. If anyone ever says it to you, run.) Even NASA’s vaunted moonshot, so often referred to as the best of government innovation, tested with dozens of unmanned missions first, several of which failed outright.

Failure is always an option. Engineers work as hard as they do because they understand the risk of failure. And for anything it might have meant in its screenplay version, here that sentiment means the opposite; the unnamed executives were saying “Addressing the possibility of failure is not an option.”

* * *
The management question, when trying anything new, is “When does reality trump planning?” For the officials overseeing Healthcare.gov, the preferred answer was “Never.” Every time there was a chance to create some sort of public experimentation, or even just some clarity about its methods and goals, the imperative was to avoid giving the opposition anything to criticize.

At the time, this probably seemed like a way of avoiding early failures. But the project’s managers weren’t avoiding those failures. They were saving them up. The actual site is worse—far worse—for not having early and aggressive testing. Even accepting the crassest possible political rationale for denying opponents a target, avoiding all public review before launch has given those opponents more to complain about than any amount of ongoing trial and error would have.

In his most recent press conference about the problems with the site, the President ruefully compared his campaigns’ use of technology with Healthcare.gov:

And I think it’s fair to say that we have a pretty good track record of working with folks on technology and IT from our campaign, where, both in 2008 and 2012, we did a pretty darn good job on that. [...] If you’re doing it at the federal government level, you know, you’re going through, you know, 40 pages of specs and this and that and the other and there’s all kinds of law involved. And it makes it more difficult — it’s part of the reason why chronically federal IT programs are over budget, behind schedule.

It’s certainly true that Federal IT is chronically challenged by its own processes. But the biggest problem with Healthcare.gov was not timeline or budget. The biggest problem was that the site did not work, and the administration decided to launch it anyway.

This is not just a hiring problem, or a procurement problem. This is a management problem, and a cultural problem. The preferred method for implementing large technology projects in Washington is to write the plans up front, break them into increasingly detailed specifications, then build what the specifications call for. It’s often called the waterfall method, because on a timeline the project cascades from planning, at the top left of the chart, down to implementation, on the bottom right.

Like all organizational models, waterfall is mainly a theory of collaboration. By putting the most serious planning at the beginning, with subsequent work derived from the plan, the waterfall method amounts to a pledge by all parties not to learn anything while doing the actual work. Instead, waterfall insists that the participants will understand best how things should work before accumulating any real-world experience, and that planners will always know more than workers.

This is a perfect fit for a culture that communicates in the deontic language of legislation. It is also a dreadful way to make new technology. If there is no room for learning by doing, early mistakes will resist correction. If the people with real technical knowledge can’t deliver bad news up the chain, potential failures get embedded rather than uprooted as the work goes on.
In short, power and politics swamp out any actual engineering. This, in spades, for an ideologically polarized issue.
With a site this complex, things were never going to work perfectly the first day, whatever management thought they were procuring. Yet none of the engineers with a grasp of this particular reality could successfully convince the political appointees to adopt the obvious response: “Since the site won’t work for everyone anyway, let’s decide what tests to run on the initial uses we can support, and use what we learn to improve.”

In this context, testing does not just mean “Checking to see what works and what doesn’t.” Even the Healthcare.gov team did some testing; it was late and desultory, but at least it was there. (The testers recommended delaying launch until the problems were fixed. This did not happen.) Testing means seeing what works and what doesn’t, and acting on that knowledge, even if that means contradicting management’s deeply held assumptions or goals. In well run organizations, information runs from the top down and from the bottom up.

One of the great descriptions of what real testing looks like comes from Valve software, in a piece detailing the making of its game Half-Life. After designing a game that was only sort of good, the team at Valve revamped its process, including constant testing:

This [testing] was also a sure way to settle any design arguments. It became obvious that any personal opinion you had given really didn’t mean anything, at least not until the next test. Just because you were sure something was going to be fun didn’t make it so; the testers could still show up and demonstrate just how wrong you really were.

“Any personal opinion you had given really didn’t mean anything.” So it is in the government; an insistence that something must work is worthless if it actually doesn’t.

An effective test is an exercise in humility; it’s only useful in a culture where desirability is not confused with likelihood. For a test to change things, everyone has to understand that their opinion, and their boss’s opinion, matters less than what actually works and what doesn’t. (An organization that isn’t learning from its users has decided it doesn’t want to learn from its users.)

Given examples of technological success from commercial firms, a common response is that the government has special constraints, and thus cannot develop projects piecemeal, test with citizens, or learn from its mistakes in public. I was up at the Kennedy School a month after the launch, talking about technical leadership and Healthcare.gov, when one of the audience members made just this point, proposing that the difficult launch was unavoidable, because the government simply couldn’t have tested bits of the project over time.

That observation illustrates the gulf between planning and reality in political circles. It is hard for policy people to imagine that Healthcare.gov could have had a phased rollout, even while it is having one.
If you are an engineer, read the whole thing. If you are not an engineer, read the whole thing. If you are a politician, forget it; you won't comprehend it.

8 comments:

Robert Coble said...

There are certainly other project management approaches besides the "waterfall" approach. (I won't even attempt to detail them all here. Try "Agile" for one.) Sadly, there are innumerable case studies which detail the dismal FACT that approximately 2/3 of large software projects fail to even deliver ANYTHING. After millions (sometimes BILLIONS) have been spent, the "system" is declared "finished" without ever being run ONCE. For a case in point, research the IRS Tax System Modernization. Those case studies have been copiously documented in the readily available literature. However, the management involved has NEVER, EVER read ANY of that literature. I used to be amazed (when developing project plans) that senior management had NO knowledge that there were case studies available; never mind that they had never studied them.

Blind guides of the blind - it is no wonder to me that the software does not work as planned.

The old adage still applies:

"When faced with a difference between the map and the terrain, BELIEVE THE TERRAIN!"

Too many people in management positions believe that if they have a detailed project plan, it will be followed to the inevitable successful conclusion. Nothing could be farther from reality. Without the flexibility to adapt to changing conditions, a project plan is nothing more than a work of fiction.

Stan said...

Robert: Right.
And there is little flexiblity if there is no periodic testing, analysis and re-orientation scheduled in.

Robert Coble said...

One of the most "amusing" aspects of large project planning is "schedule negotiation." Here is a personal example.

I was part of a large group of managers who were tasked to "plan" to develop a new [large] system for the Navy ships afloat. We dutifully gathered in a conference room, every day for a week. We hired an independent consultant to "assist" with the plan development and three different large commercial project planning software packages. (None of the managers other than me had ever used a software project planner.) We "brain-stormed" every possible scenario, adjusting assumptions accordingly. At the end of the week, the senior manager took the completed plan (which naturally had ballooned beyond belief in both time required and number of technical personnel required) to the Executive Director. In less than 5 minutes, she was back in the conference room, with her tail tucked firmly up between her legs. The Executive Director had paid NO attention to anything in the plan EXCEPT how long it would take and how many people it would take, and she had immediately dictated that it was too long and too expensive in "resources" (the euphemism for technical people). We were tasked to adjust our "plan" to meet the dictated schedule and resource requirements. I immediately quit the "team". When questioned, I simply stated "We have already been given the schedule and resource allocation. Now I have to figure out how to do it within those constraints." The rest of the group wasted another week, adjusting everything so that the final schedule and resources matched the dictated requirements. (I "cheated" and kept track privately throughout the project.) The project began, and it followed the original estimate almost exactly. The really amusing thing was that the project was reported up-line to Washington as "on time and on budget" even though it was nowhere near either at any time other than day 1. No heads rolled, and the ones responsible were advanced to the next higher level of incompetence. And so it goes...

Stan said...

Har! Were you punished for leaving the project?

It works that way in industry, too. The plan has to look really swell up front in order to get the customer locked in and management happy. It seems easier for management to deal with unhappy customers than no customers at all.

My first job out of school was aerospace involving F4D fighter electronics, classified. We PERT charted it, and it got accepted as we scheduled it. The two man team, myself and a mentor engineer, got it in on time. But I think that was the last time.

Many times the manager would dictate a time and ask, can you meet that? "umm, I'll try" was an acceptable wiggle answer. Everyone always knew otherwise. It's a game, much like the "spend all your budget by the end of the fiscal year or it will be reduced next year" game.

Robert Coble said...

No, I didn't get punished for quitting the "team." I was known for being somewhat "crazy" anyway, but they needed my expertise, so they tolerated my nonconformist behavior. I also was the only manager that could take a project disaster and dig it out of the hole. I studied project disasters and "crunch mode" remedies simply because I had plenty of opportunities to "rescue" projects gone wild.

Occasionally, I did have a certain amount of fun, tweaking their noses while they were unsuspecting. One manager demanded a project plan without any specification of what was to be accomplished and with no specified date. When I pointed out that it would not be possible to plan anything under those conditions, she replied, "UNACCEPTABLE!" So, I said it would take about 20 years, long enough for me to retire. I got another "UNACCEPTABLE!" Next, I tried using two weeks, with the same result: "UNACCEPTABLE!" So, I told her to go make her own damn schedule. She went off in a snit. I had access to several Dilbert cartoons and kept them around my office space. I made a new Dilbert cartoon, with the pointy-haired manager (her) and Dilbert (me, naturally), having that same conversation. I put it on the office wall in a prominent place, and waited for her to come back to see me about that schedule. About two days later, she came back. She always liked to read the Dilbert cartoons I had on the wall, so she saw the "new" one immediately, and soon was laughing. The laughter died immediately when I pointed out that it was not really a Scott Adams cartoon, but a "Crazy" Bob cartoon about our previous conversation. She stormed off again, ready to kill me. It took her a couple of weeks to come back with a preliminary specification.

Stan said...

If there's no Dilbert on the wall, then it's not an engineer's place. The editors tried to take Dilbert out of the newspaper when I was in Colorado Springs many years ago, and there was nearly a riot over it. Dilbert stayed.

I once had a new manager tell me "Don't bring me any problems! Bring me only solutions." I solved that by being gone in a week or so. Not long after, he was relieved of duty.

I bet we could compare notes for quite awhile on these subjects!

I once had a manager tell me to record everything I did during the day, and keep doing it. So I did, including the time it took me to write it down, and the overtime I put in, too. He didn't last very long either. A friend once calculated his MTBB - mean time between bosses - as 2.5 months over a three year period.

Steven Satak said...

"Don't tell me what you can't do, tell me what you CAN do". That precious gem from a former boss, a Senior Chief who thought he was all that and a bag of chips.

Of course, occasionally he did come up with some good stuff. My personal favorite was one I quote to my son every so often:

"The only thing worse than bad news is LATE bad news". So very true.

Russell said...

Robert and Stan, I work for a large corporation as a programmer, the problems are there, too. Almost to a T.

"And I think it’s fair to say that we have a pretty good track record of working with folks on technology and IT from our campaign"

Let's just ignore the fact that his tech team leveraged existing platforms that had been developed and succeeded commercially first, before the Hope-n-Change team even showed up.