The Myth Of Risk Management
The Myth Of Risk Management
Insure Your Project Against the Unknown and the Unknowable
Risk management is an illusion. We must recognize that software projects are inherently risky and admit to ourselves that it's not the known problems that are going to cause our projects to fail. It's the risks that are unmentionable, uncontrollable, unquantifiable, or unknown that make projects crash and burn.
Risk management is an illusion. Risks cannot be managed in software development projects because they are either unmentionable, uncontrollable, unquantifiable, or unknown. Projects run into problems because they are hit by something that everyone knew about but was unable to talk about, something totally unexpected, something that was rated as highly unlikely, or something that was made worse by the “mitigation strategy.”
If risks were truly manageable, we would see far fewer projects that end up stressed—or rather, with team members who are stressed because they are faced with a project that is not going anywhere near as planned. In practice, however, most projects end up stressed in one way or another, so a project that finishes on time, on budget, and with all of the initially planned features is a rarity (unless the numbers are adjusted as the project proceeds).
Project Failures Are Not Simple
We would all like project life to be simple—to have a magic fix that would make project failures go away—but there is no such thing. Indeed, it seems that managers are trained to assume that project failures are due to a single specific cause—misunderstanding of what the users need, key items delivered late, defect counts rising as the project nears its release date, or loss of key people.
Reality suggests otherwise: Projects rarely fail because of one single identifiable cause. A classic example of this is the Challenger disaster. Yes, the “ultimate” cause made for a great sound bite on TV, with Dr. Richard Feynman showing that the O-rings were not elastic at low temperatures, but it was not as simple as that. The minority report [1] written by Feynman showed that there were many other things that were disasters just waiting to happen.
To avoid being taken in by the myth of risk management, we must recognize that software projects are inherently risky and failure prone. Rather than assume we can identify and “mitigate” the risks we will face, or assume that this time our project will be different, we must assume that we are doing something that is leading our project closer to failure.
While it is true that there are some things for which every project should prepare, this is a short, generic, and generally useless list. Rather, you should focus on these following items:
- Politically unmentionable risks
- Uncontrollable factors
- Unquantifiable risks
- The unexpected
Politically Unmentionable Risks
Many organizations have the habit of “shooting the messenger” whenever there is bad news. So when a project crashes and burns, the developers and the users are not at all surprised, but the “senior” managers, who were supposedly providing “adult supervision” of the project, are utterly shocked that the project has failed, or they simply declare “mission accomplished” and leave the users to deal with the mess of a buggy, partially implemented system.
I recently observed this on an important project with a mandated completion date dependent on procuring some hardware that was required for a five-week suite of system tests. The hardware arrived a month late, but five weeks after the original planned start of system testing “questions were being asked” about why the testing was not yet complete. Several months later, the schedule slip had not been officially recognized in the project plan; instead there were repeated pronouncements that the timeline was “not allowed to slip.” Eventually, when the original project completion date was near, management was forced to acknowledge the slip.
When the people in charge of managing the risk are the ones creating the risk, it is not hard to see that risk management is an illusion. Examples include:
- Hoping that a major slippage can be caught up
- Mandating that a project be completed by an arbitrary due date that is a “stretch target”
- Delaying the start of a project but still expecting the original completion date
- Assigning unskilled developers to a project but not allocating any time for training and mentoring
It should be possible for team members to point out these problems and management to correct the mistakes. Unfortunately, life does not work that way. In fiction it might be OK to say that the emperor is ideally dressed for a party at the nudist beach, but in most organizations telling the truth is a classic “bad career move.”
So what really happens? All minor risks are identified and “mitigated” while there is a big elephant in the room that nobody is talking about. Sometimes, if you are lucky, you can get an outsider to raise the issue, but in most cases it just sits there unmentioned, the project crashes and burns, and the failure is attributed to other causes, generally something outside everyone’s control.
There is a simple fix for project politics, but it can be difficult to implement: Ensure that there is a balance of power between the business and technology sides of the project. This is necessary so that each side can hold the other accountable for its promises and decisions. Too often this is a one-way proposition.
Uncontrollable Factors
Whenever I need some light relief on a project, I look at the mitigation strategies that are in place for the various risks that have been identified. In most cases the “strategy” is nothing more than a reactive policy to work harder, spend more money, reduce the project scope, or slip the schedule. Rarely does the mitigation strategy include anything that implies that the team will have to think deeply about how to address the actual circumstances at the time the risk is detected.
Most risk mitigation strategies fail to take into account that software development is not predictable—there is an essential, uncontrollable uncertainty and randomness to it. Back near the dawn of software development, when people actually did empirical studies of how projects turned out, they found that early estimates of project size could easily be off by a factor of four [2]. So something that was estimated to take one person-year could take anywhere from three person-months to four person-years.
This level of uncertainty is not acceptable to senior managers, so most organizations have mandated a limit to the variability of estimates. Often a project is allowed to miss its estimates by up to 20 percent (maybe even 50 percent in more enlightened organizations). But all the managers do by mandating a 20 percent limit is make it politically unacceptable to talk about the real project variability, which is twenty times larger.
Organizations must adopt approaches that acknowledge and embrace uncertainty because many factors that influence a project are outside of the project’s control. This means organizations must select leaders who are open to alternatives and are not afraid of surprises. Unfortunately, most organizations do not follow this practice, in part because it is not easy to find such leaders.
Unquantifiable Risks
In games of chance, the probability of an adverse event can be determined. However, knowing the probability of such an event in a software development project is essentially unknowable because software projects are essentially human activities, and humans are unpredictable. Even in games of chance where the probabilities are directly computable, most people will drastically overestimate their chance of winning as is evidenced by the number of people who buy lottery tickets. Yes, someone will eventually win the big prize, but the chance that it will be you is usually less than one in ten million, so the expected return on a $1 ticket is in most cases less than fifty cents. (Lotteries have expenses, many are designed to raise money for charity, so few lotteries pay out more than 50 percent of the money they bring in.)
We are not good at assigning probabilities to events, even when we have good statistical data, and we fail miserably when we don’t have any historical data. A good example is how much certainty we ascribe to our estimates. We typically believe our estimates are within 50 percent even though empirical studies show that estimates can be off by a factor of four.
Even when we have good statistical data, it is not always available to the project team. For example, when planning a project, it is useful to know the level of staff turnover. If it is more than 15 percent, there is a good chance that one in six team members will leave before completing a one-year project. But many companies will not make this information available to project managers, so although they know that their project will have some turnover, they cannot quantify the risk.
Even if the project manager is given the overall corporate turnover numbers, the risk is still unquantifiable because the circumstances of the project can change the probability that team members will leave. If a project is afflicted with particularly bozo decision making, then the team may suffer an almost complete turnover of staff. Even worse, you may find that although a central core survives, there is rapid attrition of all new hires, which could have been predicted given the overall average turnover for the organization.
Dealing with unquantifiable risks requires accepting that not everything is reducible to numbers and probabilities. It requires senior managers who are comfortable with unknowns and who are willing to accept that some things are unknowable. Risk averse managers cause many projects to fail because they make “safe” decisions that end up putting the entire project at risk.
The Unexpected (No One Expects the Spanish Inquisition!)
Even in organizations that seem to identify risks early, there always are some items that just pop up to cause a major impact on a project. Even worse, these often are things that catch everyone by surprise but seem obvious in hindsight. A good example of this is the “slashdot effect” [3], in which your Web site goes from total obscurity to overwhelming popularity and then crashes due to overload. Yes, in hindsight it is obvious that Web sites should be designed to support large numbers of visitors, but as few as five years ago, major corporations with eCommerce Web sites were regularly failing whenever they had a sale or major promotion.
The problem with drawing attention to such unexpected events, however, is that someone, somewhere will add them to a risk checklist for all future projects to assess. But again, it is not the events that you are prepared for that cause the biggest problems—it is the events that catch you by surprise. Few, if any, surprises in software development result in a project that is easier to deliver—most surprises mean that something has gone wrong.
Big checklists of possible risks are not much use. After all, the likelihood that your building will be destroyed is very low, but because we have seen some very visible examples of this type, most organizations now have some sort of disaster recovery plan in place. That means the organization is spending a substantial amount of money to address a single, unlikely risk and thereby has less money to spend on figuring out ways to respond to all of the other types of risks or on improving existing systems.
Most risk management plans, while useful in a CYA way, have no impact on the overall success of projects. The risks that make it onto the risk mitigation lists are interesting in an academic sense, but not all that useful. Good project managers know that suppliers may be late so they schedule appropriately; they do not list this as an item in the risk mitigation plan. I’ll say it again: It is not the known risks that cause problems; it is the completely unexpected things that get you. Too many projects crash and burn because senior managers are completely out of touch with the realities of their projects.
Flexibility is the key to surviving the unexpected. Rather than making detailed project plans and creating interminable mitigation strategies for every conceivable risk only to be blindsided by something that wasn’t in the plans, organizations need to understand that bad stuff happens.
Forget about Risk Management, Create Your own Insurance
The current methods used to manage risk are obviously failing or there would not be as many stressed projects. An “insurance model” of project management may sound a little crazy, but having insurance is usually less risky or expensive than the alternative. What would project insurance look like? What would we be insuring against?
The insurance should cover:
- Incomplete information
- Learning the wrong lessons
- Death by a thousand cuts
- Late-breaking news
Incomplete Information
All projects must insure themselves against incomplete information, especially information about how a project is tracking. While most management teams set up structures that keep them informed of project concerns, most managers want to be brought “solutions, not problems.” As a result they live in a world that is out of touch with the underlying project realities because nobody is allowed to raise an issue without having a well thought out plan for addressing the problem. The result of this policy is that intractable problems are not being raised as issues.
It’s common among developers to say that things would go much better if managers would just listen to them and to users, but sadly, such managers don’t get far in today’s management culture. Even the Big Visible Charts that are designed to make issues visible to the team have little impact. With enough political pressure, the number of critical defects “magically drops” as the deadline to go live approaches, but this does nothing to mitigate risk. As journalist George Monbiot [4] reminds us, “Tell people something they know already and they will thank you for it. Tell them something new and they will hate you for it.”
Learning the Wrong Lessons
We hope that organizations could learn through project retrospectives, but in practice few organizations bother to really analyze their failed projects; when they do, they often learn the wrong lesson. There are so many ways to really mess up a project that it is easy to focus on a specific problem in a project and then mandate that all future projects must do something to mitigate that problem. So, for example, we see many companies requiring a requirements document to be signed off (“frozen”) by a senior manager before design can start because one project had a problem with requirements changing during the project.
A good example of learning the wrong lessons is the way NASA responded to the Challenger disaster discussed earlier in this article. Yes, there was an inquiry and a report, but it did not fundamentally change the way that people went about their jobs. What is especially tragic about this is that Richard Feynman had to write a minority report to get his findings included, and even then, very few people responded to his key message: “For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.”
As NASA has shown, it is difficult to avoid learning the wrong lesson. Yes, there were some specific issues with the technology, but NASA’s main failing was the cultural one of downplaying the risks and trying to sell the idea that spaceflight was becoming routine. Insuring against this is difficult because cultures are resistant to learning new ways of doing things—things can be changed within the existing framework, but changing the framework requires revolution.
Death by a thousand cuts
What managers often fail to understand is how seemingly minor things can result in project failure—simple things like a few days’ delay in getting access to other required systems. The developers must guess how the other system operates, and a few weeks or months later, that guess turns out to be wrong.
Projects fail because a lot of little things go wrong, and by the time these minor things have made their way through the reporting structures and become visible, the project is headed for disaster. Insuring against this requires the invention of a reporting mechanism that will allow the implications of a set of little issues to reach the project decision makers much earlier.
Late-breaking news
Insurance also is needed for late-breaking news about project breakage. Typically there is ample early warning that things are not quite as they should be, but normally there is no one thing that is really specific, just a lot of little hints that something bad is likely to happen.
Although there is no simple mechanism for early detection of issues, the agile approaches have made it much easier. They have done this by taking the deadline—a much-loved project management tool—and making it a regular part of the day. Living with hard deadlines every day makes it virtually impossible for emerging problems to escape notice.
On the micro-scale, many agile teams who are now using test-driven development or a similar practice that ensures clean code have a daily milestone for checking their integrated, tested, working code into the source code repository. With this daily milestone in place, teams can easily see when code quality is decreasing if the effort required each day to check in code starts to increase. Similarly, there is an early warning about progress, or the lack thereof, by the amount of functionality that is added every day.
Some teams may ignore this daily feedback, but with diligence, the discipline of the daily deadline constitutes an effective early warning system about declines in code quality and team productivity.
On a larger scale, the agile practice of having two-week to one-month development iterations provides an excellent deadline for measuring project progress. At the start of each iteration the team agrees with the key stakeholders about what team members will deliver, and at the end they show that the team has delivered what was agreed. This provides yet another early warning mechanism for all stakeholders about the ability of the team to deliver.
Focus on Becoming Flexible and Reactive
The most important bit of insurance a project can buy, however, is insurance against not comprehending that things are changing. A project needs to be viewed as a learning environment for everyone involved. If new information is not coming to light every day and being acted upon, no one is learning anything about the project’s true status, and the project is likely to fail. Constantly monitoring progress takes much of the drama out of uncertainty, and, therefore, much of the risk out of a project. Variability in project parameters is natural; being continually aware, informed, and responsive needs to be just as natural.
Rather than planning for specific, known risks, projects should work from the assumption that something will occur that impacts the project, and create mechanisms that provide for the early detection and response to such events. It is not the known things that cause us problems. It’s the unknown and unknowable that cause projects to crash and burn.
As an insurance policy, learning is rare but crucial to project success. Perhaps it was best summed up by Multics wizard Tom Van Vleck [5], who said, “You learn something every day, unless you’re careful.”
References:
- Appendix F - Personal observations on the reliability of the Shuttle
- Software Engineering Economics, Boehm, B., Prentice Hall, 1981.
- Slashdot effect
- Monbiot.com
- Tom Van Vleck

