Skip to main content

The Myth Of Risk Management

Article

The Myth Of Risk Management

Insure Your Project Against the Unknown and the Unknowable

Article by Pete McBreen | Comments: (0) | Fri, 05/30/2008 - 1:14pm
Summary:

Risk management is an illusion. We must recognize that software projects are inherently risky and admit to ourselves that it's not the known problems that are going to cause our projects to fail. It's the risks that are unmentionable, uncontrollable, unquantifiable, or unknown that make projects crash and burn.

Risk management is an illusion. Risks cannot be managed in software development projects because they are either unmentionable, uncontrollable, unquantifiable, or unknown. Projects run into problems because they are hit by something that everyone knew about but was unable to talk about, something totally unexpected, something that was rated as highly unlikely, or something that was made worse by the “mitigation strategy.” 

If risks were truly manageable, we would see far fewer projects that end up stressed—or rather, with team members who are stressed because they are faced with a project that is not going anywhere near as planned. In practice, however, most projects end up stressed in one way or another, so a project that finishes on time, on budget, and with all of the initially planned features is a rarity (unless the numbers are adjusted as the project proceeds). 

Project Failures Are Not Simple
We would all like project life to be simple—to have a magic fix that would make project failures go away—but there is no such thing. Indeed, it seems that managers are trained to assume that project failures are due to a single specific cause—misunderstanding of what the users need, key items delivered late, defect counts rising as the project nears its release date, or loss of key people. 

Reality suggests otherwise: Projects rarely fail because of one single identifiable cause. A classic example of this is the Challenger disaster. Yes, the “ultimate” cause made for a great sound bite on TV, with Dr. Richard Feynman showing that the O-rings were not elastic at low temperatures, but it was not as simple as that. The minority report [1] written by Feynman showed that there were many other things that were disasters just waiting to happen. 

To avoid being taken in by the myth of risk management, we must recognize that software projects are inherently risky and failure prone. Rather than assume we can identify and “mitigate” the risks we will face, or assume that this time our project will be different, we must assume that we are doing something that is leading our project closer to failure.

While it is true that there are some things for which every project should prepare, this is a short, generic, and gen­erally useless list. Rather, you should focus on these following items:

  • Politically unmentionable risks
  • Uncontrollable factors
  • Unquantifiable risks
  • The unexpected

Politically Unmentionable Risks
Many organizations have the habit of “shooting the messenger” whenever there is bad news. So when a project crashes and burns, the developers and the users are not at all surprised, but the “senior” managers, who were suppos­edly providing “adult supervision” of the project, are utterly shocked that the project has failed, or they simply declare “mission accomplished” and leave the users to deal with the mess of a buggy, partially implemented system.

I recently observed this on an impor­tant project with a mandated completion date dependent on procuring some hard­ware that was required for a five-week suite of system tests. The hardware ar­rived a month late, but five weeks after the original planned start of system testing “questions were being asked” about why the testing was not yet com­plete. Several months later, the schedule slip had not been officially recognized in the project plan; instead there were re­peated pronouncements that the timeline was “not allowed to slip.” Eventually, when the original project completion date was near, management was forced to acknowledge the slip.

When the people in charge of man­aging the risk are the ones creating the risk, it is not hard to see that risk man­agement is an illusion. Examples include:

  • Hoping that a major slippage can be caught up
  • Mandating that a project be com­pleted by an arbitrary due date that is a “stretch target”
  • Delaying the start of a project but still expecting the original com­pletion date
  • Assigning unskilled developers to a project but not allocating any time for training and mentoring

It should be possible for team mem­bers to point out these problems and management to correct the mistakes. Unfortunately, life does not work that way. In fiction it might be OK to say that the emperor is ideally dressed for a party at the nudist beach, but in most organi­zations telling the truth is a classic “bad career move.”

So what really happens? All minor risks are identified and “mitigated” while there is a big elephant in the room that nobody is talking about. Sometimes, if you are lucky, you can get an outsider to raise the issue, but in most cases it just sits there unmentioned, the project crashes and burns, and the failure is at­tributed to other causes, generally some­thing outside everyone’s control.

There is a simple fix for project poli­tics, but it can be difficult to implement: Ensure that there is a balance of power between the business and technology sides of the project. This is necessary so that each side can hold the other ac­countable for its promises and decisions. Too often this is a one-way proposition.

Uncontrollable Factors
Whenever I need some light relief on a project, I look at the mitigation strate­gies that are in place for the various risks that have been identified. In most cases the “strategy” is nothing more than a re­active policy to work harder, spend more money, reduce the project scope, or slip the schedule. Rarely does the mitigation strategy include anything that implies that the team will have to think deeply about how to address the actual circum­stances at the time the risk is detected.

Most risk mitigation strategies fail to take into account that software develop­ment is not predictable—there is an es­sential, uncontrollable uncertainty and randomness to it. Back near the dawn of software development, when people ac­tually did empirical studies of how proj­ects turned out, they found that early estimates of project size could easily be off by a factor of four [2]. So something that was estimated to take one person-year could take anywhere from three person-months to four person-years.

This level of uncertainty is not ac­ceptable to senior managers, so most or­ganizations have mandated a limit to the variability of estimates. Often a project is allowed to miss its estimates by up to 20 percent (maybe even 50 percent in more enlightened organizations). But all the managers do by mandating a 20 percent limit is make it politically unacceptable to talk about the real project variability, which is twenty times larger.

Organizations must adopt approaches that acknowledge and embrace uncer­tainty because many factors that influ­ence a project are outside of the project’s control. This means organizations must select leaders who are open to alterna­tives and are not afraid of surprises. Un­fortunately, most organizations do not follow this practice, in part because it is not easy to find such leaders.

Unquantifiable Risks
In games of chance, the probability of an adverse event can be determined. However, knowing the probability of such an event in a software develop­ment project is essentially unknowable because software projects are essentially human activities, and humans are unpre­dictable. Even in games of chance where the probabilities are directly computable, most people will drastically overestimate their chance of winning as is evidenced by the number of people who buy lot­tery tickets. Yes, someone will eventually win the big prize, but the chance that it will be you is usually less than one in ten million, so the expected return on a $1 ticket is in most cases less than fifty cents. (Lotteries have expenses, many are designed to raise money for charity, so few lotteries pay out more than 50 percent of the money they bring in.)

We are not good at assigning prob­abilities to events, even when we have good statistical data, and we fail miser­ably when we don’t have any historical data. A good example is how much certainty we ascribe to our estimates. We typically believe our estimates are within 50 percent even though empirical studies show that estimates can be off by a factor of four.

Even when we have good statistical data, it is not always available to the project team. For example, when plan­ning a project, it is useful to know the level of staff turnover. If it is more than 15 percent, there is a good chance that one in six team members will leave be­fore completing a one-year project. But many companies will not make this in­formation available to project managers, so although they know that their project will have some turnover, they cannot quantify the risk.

Even if the project manager is given the overall corporate turnover numbers, the risk is still unquantifiable because the circumstances of the project can change the probability that team members will leave. If a project is afflicted with par­ticularly bozo decision making, then the team may suffer an almost complete turnover of staff. Even worse, you may find that although a central core sur­vives, there is rapid attrition of all new hires, which could have been predicted given the overall average turnover for the organization.

Dealing with unquantifiable risks re­quires accepting that not everything is reducible to numbers and probabilities. It requires senior managers who are comfortable with unknowns and who are willing to accept that some things are unknowable. Risk averse managers cause many projects to fail because they make “safe” decisions that end up put­ting the entire project at risk.

The Unexpected (No One Expects the Spanish Inquisition!)
Even in organizations that seem to identify risks early, there always are some items that just pop up to cause a major impact on a project. Even worse, these often are things that catch everyone by surprise but seem obvious in hindsight. A good example of this is the “slashdot effect” [3], in which your Web site goes from total obscurity to overwhelming popularity and then crashes due to over­load. Yes, in hindsight it is obvious that Web sites should be designed to support large numbers of visitors, but as few as five years ago, major corporations with eCommerce Web sites were regu­larly failing whenever they had a sale or major promotion.

The problem with drawing attention to such unexpected events, however, is that someone, somewhere will add them to a risk checklist for all future projects to assess. But again, it is not the events that you are prepared for that cause the biggest problems—it is the events that catch you by surprise. Few, if any, sur­prises in software development result in a project that is easier to deliver—most surprises mean that something has gone wrong.

Big checklists of possible risks are not much use. After all, the likelihood that your building will be destroyed is very low, but because we have seen some very visible examples of this type, most orga­nizations now have some sort of disaster recovery plan in place. That means the organization is spending a substantial amount of money to address a single, unlikely risk and thereby has less money to spend on figuring out ways to respond to all of the other types of risks or on improving existing systems.

Most risk management plans, while useful in a CYA way, have no impact on the overall success of projects. The risks that make it onto the risk mitigation lists are interesting in an academic sense, but not all that useful. Good project man­agers know that suppliers may be late so they schedule appropriately; they do not list this as an item in the risk mitiga­tion plan. I’ll say it again: It is not the known risks that cause problems; it is the completely unexpected things that get you. Too many projects crash and burn because senior managers are com­pletely out of touch with the realities of their projects.

Flexibility is the key to surviving the unexpected. Rather than making de­tailed project plans and creating inter­minable mitigation strategies for every conceivable risk only to be blindsided by something that wasn’t in the plans, or­ganizations need to understand that bad stuff happens.

Forget about Risk Management, Create Your own Insurance
The current methods used to manage risk are obviously failing or there would not be as many stressed projects. An “in­surance model” of project management may sound a little crazy, but having in­surance is usually less risky or expensive than the alternative. What would project insurance look like? What would we be insuring against?

The insurance should cover:

  • Incomplete information
  • Learning the wrong lessons
  • Death by a thousand cuts
  • Late-breaking news

Incomplete Information
All projects must insure themselves against incomplete information, espe­cially information about how a project is tracking. While most management teams set up structures that keep them informed of project concerns, most man­agers want to be brought “solutions, not problems.” As a result they live in a world that is out of touch with the underlying project realities because no­body is allowed to raise an issue without having a well thought out plan for ad­dressing the problem. The result of this policy is that intractable problems are not being raised as issues.

It’s common among developers to say that things would go much better if managers would just listen to them and to users, but sadly, such managers don’t get far in today’s management culture. Even the Big Visible Charts that are de­signed to make issues visible to the team have little impact. With enough political pressure, the number of critical defects “magically drops” as the deadline to go live approaches, but this does nothing to mitigate risk. As journalist George Mon­biot [4] reminds us, “Tell people some­thing they know already and they will thank you for it. Tell them something new and they will hate you for it.”

Learning the Wrong Lessons
We hope that organizations could learn through project retrospectives, but in practice few organizations bother to really analyze their failed projects; when they do, they often learn the wrong lesson. There are so many ways to re­ally mess up a project that it is easy to focus on a specific problem in a project and then mandate that all future proj­ects must do something to mitigate that problem. So, for example, we see many companies requiring a requirements document to be signed off (“frozen”) by a senior manager before design can start because one project had a problem with requirements changing during the project.

A good example of learning the wrong lessons is the way NASA re­sponded to the Challenger disaster dis­cussed earlier in this article. Yes, there was an inquiry and a report, but it did not fundamentally change the way that people went about their jobs. What is es­pecially tragic about this is that Richard Feynman had to write a minority report to get his findings included, and even then, very few people responded to his key message: “For a successful tech­nology, reality must take precedence over public relations, for nature cannot be fooled.”

As NASA has shown, it is difficult to avoid learning the wrong lesson. Yes, there were some specific issues with the technology, but NASA’s main failing was the cultural one of downplaying the risks and trying to sell the idea that spaceflight was becoming routine. Insuring against this is difficult because cultures are re­sistant to learning new ways of doing things—things can be changed within the existing framework, but changing the framework requires revolution.

Death by a thousand cuts
What managers often fail to under­stand is how seemingly minor things can result in project failure—simple things like a few days’ delay in getting access to other required systems. The developers must guess how the other system oper­ates, and a few weeks or months later, that guess turns out to be wrong.

Projects fail because a lot of little things go wrong, and by the time these minor things have made their way through the reporting structures and become visible, the project is headed for disaster. Insuring against this requires the invention of a reporting mechanism that will allow the implications of a set of little issues to reach the project deci­sion makers much earlier.

Late-breaking news
Insurance also is needed for late-breaking news about project breakage. Typically there is ample early warning that things are not quite as they should be, but normally there is no one thing that is really specific, just a lot of little hints that something bad is likely to happen.

Although there is no simple mecha­nism for early detection of issues, the agile approaches have made it much easier. They have done this by taking the deadline—a much-loved project manage­ment tool—and making it a regular part of the day. Living with hard deadlines every day makes it virtually impossible for emerging problems to escape notice.

On the micro-scale, many agile teams who are now using test-driven devel­opment or a similar practice that en­sures clean code have a daily milestone for checking their integrated, tested, working code into the source code re­pository. With this daily milestone in place, teams can easily see when code quality is decreasing if the effort re­quired each day to check in code starts to increase. Similarly, there is an early warning about progress, or the lack thereof, by the amount of functionality that is added every day.

Some teams may ignore this daily feedback, but with diligence, the disci­pline of the daily deadline constitutes an effective early warning system about declines in code quality and team pro­ductivity.

On a larger scale, the agile practice of having two-week to one-month devel­opment iterations provides an excellent deadline for measuring project progress. At the start of each iteration the team agrees with the key stakeholders about what team members will deliver, and at the end they show that the team has de­livered what was agreed. This provides yet another early warning mechanism for all stakeholders about the ability of the team to deliver.

Focus on Becoming Flexible and Reactive
The most important bit of insurance a project can buy, however, is insur­ance against not comprehending that things are changing. A project needs to be viewed as a learning environment for everyone involved. If new information is not coming to light every day and being acted upon, no one is learning anything about the project’s true status, and the project is likely to fail. Constantly moni­toring progress takes much of the drama out of uncertainty, and, therefore, much of the risk out of a project. Variability in project parameters is natural; being con­tinually aware, informed, and responsive needs to be just as natural.

Rather than planning for specific, known risks, projects should work from the assumption that something will occur that impacts the project, and create mechanisms that provide for the early detection and response to such events. It is not the known things that cause us problems. It’s the unknown and unknowable that cause projects to crash and burn.

As an insurance policy, learning is rare but crucial to project success. Per­haps it was best summed up by Multics wizard Tom Van Vleck [5], who said, “You learn something every day, unless you’re careful.”


References:

  1. Appendix F - Personal observations on the reliability of the Shuttle
  2. Software Engineering Economics, Boehm, B., Prentice Hall, 1981.
  3. Slashdot effect
  4. Monbiot.com
  5. Tom Van Vleck
About The Author: Pete McBreen

Pete McBreen is the author of Software Craftsmanship and Questioning Extreme Programming. He is an independent consultant who actually enjoys writing and delivering software. Despite spending a lot of time writing, teaching, and mentoring, he goes out of his way to ensure that he does hands-on coding on a live project every year. Pete specializes in finding creative solutions to problems that software developers face. After many years of working on formal and informal process improvement initiatives, he took a sideways look at the problem and realized, "Software development is meant to be fun. If it isn't, the process is wrong." Pete lives in Cochrane, Alberta, Canada and has no plans to move back to a big city.