Myth Busting 24: Basing your reliability program on Root Cause Failure Analysis

Myth Busting 24: Basing your reliability program on Root Cause Failure Analysis

Myth Busting Series

Root Cause Failure Analysis (also called, Root Cause Analysis) is great for eliminating the causes of failures. It’s usually used where there are major production, cost, safety, or environmental consequences. But it only deals with failures that have already happened – it is usually triggered by the very consequences you would have been better off avoiding altogether.


This works best for chronic (recurring) problems with a pattern that can be spotted through analysis. The causes can be isolated and eliminated. Every even is part of a chain of events. Change any point in the chain and you change what happens afterwards. Eliminate the cause and you eliminate the problem (consequences).

RCFA can be, and often is, used to deal with acute (major) problems that don’t occur often. Maybe the problem is quite rare and hasn’t happened before, but it led to major disruption. Since these are uncommon events, the combined probability and consequence may be low when viewed over a long time period. A good analysis can eliminate those nasty events in the future, but since they were rare, is the effort really worth it? Of course that depends on the nature of the failure consequences.

Regardless of which type of event you want to avoid, recurring or one-offs, root cause methods only work after the fact. They are reactive approaches to achieving high reliability. Why not be proactive and avoid the failures BEFORE you suffer the consequences?


Yes, RCM is superior. As a proactive approach – it is the best there is. You might think RCFA and RCM (Reliability Centered Maintenance) compete. After all, they are both aimed at delivering improved reliability performance. Indeed, they do deliver that, but I think of them as complimentary.

RCM forecasts failures. Even in brand new systems, future failures are mostly easy to forecast. You can use your past history with similar systems. You can imagine the failures you may already be avoiding through your current Proactive Maintenance program. You can imagine failures that might be possible given how your new system is going to be used. RCM includes root cause analysis in order to determine how best to manage those failures. That built in RCFA is quicker than stand-alone RCFA methods. During RCM analysis, you already know the failure mode and that makes it easier. In a sense, RCFA is embedded within RCM. The difference is that it is used before the fact.

Unfortunately, RCFA doesn’t forecast failures – it waits until they occur and then deals with them after the fact.


You want to avoid future problems wherever possible. The result of your proactive efforts should be sustained reliable performance. If you wait until after you have failures, are you not being irresponsible. Do your bosses in Operations (Production) really want you to only fix problems after they’ve experienced all the negative consequences of those failures? I doubt it.

Imagine getting on a brand new model aircraft for one of it’s first flights. Would you feel comfortable if the airline boasted that it is developing its maintenance program using precise and rigorous Root Cause methods?

When to use RCM

RCM works incredibly well on systems being designed – as it does for aircraft. It won’t always catch all possible problems, but it catches most (by far). That is better than waiting until after the problems arise, the planes crash, and then figuring out what to do to avoid it happening again! You can use it on new systems, modifications, new equipment installs. You may not catch all the possible failures, but you will most certainly catch most of them. What about your existing plant systems?

Even equipment and systems that have been in service a long time can benefit. I’ve performed RCM on very old (60 years +) systems where we discovered that over-maintaining was leading to some of the failures. That over-maintenance led to high costs spent on too many PMs and failures that those PMs induced. On recently commissioned systems, you have an opportunity to precisely define your future maintenance program. Following up with good planning, you can also determine the support parts, etc. you will need to stock.


Of course RCM doesn’t come cheaply, but neither does RCFA. RCM is paid for up front, RCFA is paid for after you have failures. Often the costs and other consequences associated with failures are far more significant than the cost of the analysis itself. In the long term, RCM is far less expensive. Yet, like anything that delivers sustained long term results it requires an up-front investment. In both cases you’ll pay for training, facilitation and team members’ time. With RCM you will get more reliable performance, lower maintenance costs and increased output capacity. That last one is usually the real prize!

The way ahead

I prefer to base reliability programs on a combination of RCM and RCFA. Perform RCM as early in the life cycle of the asset as you can to maximize return. The earlier the better, ideally you’ll do it at the conceptual design stage. Collect data from the future performance. If you are successful with your RCM failure forecasting and maintenance program, then you will be somewhat frustrated at the lack of failure data to analyze. That may disappoint statisticians but isn’t that exactly what you want to happen?

If failures occur that you thought your RCM-based program should be avoiding, then use RCFA to fine tune the program. Those failures will provide valuable data points for the analysis and you should have even greater success with your RCFA efforts. If you’ve done a good job with RCM though, you will find that you don’t need to do a lot of this either.