This question was submitted to me today from a colleague through LinkedIn and I thought it would be appropriate to immediately address it here, instead of through email, where I usually do.
Email: “Hey man, I like your blog posts. They look great. Hope all is going well for you. I have a favor to ask. We are looking into the value of doing “lower level” evaluations on Human Performance incidents. Currently we do an RCI or ACE based on the incident, but we are considering doing something less than a full ACE for incidents like (and these are just possible examples, we haven’t landed on the criteria yet) a cut in error that drops only one customer or an OSHA recordable that is a very minor injury with known causes. (we can argue that one all day I know) What I really want to know is do you see people doing lower level evaluations at different places. In my most previous work prior to coming to [name withdrawn] at [name withdrawn], we did what we called a “Human Performance Evalution” however, based on conversations with people there, they no longer do them and simply do ACE investigations or nothing. Any insight you could provide would be helpful. Thanks.“
Okay, for those wondering, RCI= Root Cause Investigation and ACE= Apparent Cause Evaluation. (Some places call them RCEs – Root Cause Evaluations)
Depending on your program, Root Causes are of most significance, and called Level 1 at many institutions and typically involve a team of investigators, each with separate missions, and have a 30-day clock on them.
The primary elements for each root cause evaluation are as follows:
- Team Assignment
- Development of the Problem Statement
- Evaluation and Analysis
- Development of Corrective Action
- Report Writing
- Report Approval
- Effectiveness Review
You must determine the most basic reason for a failure, problem, or deficiency, which if corrected, will prevent recurrence. The Root Cause must meet these three criteria:
- The problem would not have occurred had the root cause not been present.
- The problem will not recur if the root cause is corrected or eliminated.
- Additionally, correction or elimination of the root cause should prevent recurrence of similar conditions.
A Level 2 would be an ACE, which is typically done with one investigator, and also has a 30 working day clock on it. They can usually be completed within a couple of weeks depending on backlog with no extensions required.
An ACE consists of a systematic approach to determining the apparent cause(s) and recommended corrective action(s) of human, programmatic, organization, and/or equipment performance problems. The ACE is not an RCI. The major difference is that an RCI is a more thorough evaluation that considers all causal factors and provides a logical determination of the root cause(s) that affect the event. Although both investigations are focused on the causal factors for the inappropriate acts and equipment failures, an ACE does not evaluate every cause of the problem and does not pursue the causal factors to the most fundamental level.
So, what about a Level 3 or Level 4? A (lower) Level 3 is most likely what we’re trying to sort out here (you can jump below now if you’re already familiar with Corrective Actions Programs). Some places say that Level 4 is a “broke/fix” model documents that something was wrong, but is now fixed and no further action is required. Level 4’s are also often “Closed to trend,” and if enough similar problems stack up (using trend codes), a Common Cause can be initiated that looks at the aggregate.
Let’s pause here for a moment and consider that we need two parts of a corrective action program to continue – Number 1: a way (software) to report an error or event and Number 2: a screener or screening committee to determine it’s significance level for investigation or not. Note that this is not an “Investigation and Evaluation Program” – It’s system is designed to develop effective Corrective Actions and trend certain components of investigations for metrics and potentially even more corrective actions.
Why do an Investigation?
Let’s first consider why we do an investigation in the first place: to develop appropriate corrective actions to prevent this from happening again. Were you thinking I was going to say to determine who is at fault? I hope we’re beyond that way of thinking at this point. In case it’s not crystal clear at this point, effective corrective actions are the goal. We can never forget this when addressing questions like this one.
A root cause “Corrective Action to Prevent Recurrence (CAPR)” is designed to prevent this event from ever happening again throughout the organization. One way to look at a well designed Apparent Cause corrective action could be that it is created to prevent that specific mistake from being made by the committing department ever again in similar situations.
Corrective actions are also supposed to be SMART (yes, the same acronym for goal setting). If you have an entire structure and workflow in your company for an effective Corrective Action Program (CAP), than you also have a Corrective Action Review Board (CARB) that view the reality, intrusiveness and the effectiveness of proposed actions. These must be carefully considered, because they cost money and typically add more to the existing system, which is burden. I actually have a term I’ve coined for this I call “PLUS 1“. Think back and how often do corrective actions you’ve known about or created remove things from the system or do nothing? Typically, management expects some new action to prevent an event from happening, so we constantly add until adding too much overwhelms the system creating all new problems. I encourage you to look into my post or other sources on INPO’s “Cumulative Impact” document to further your knowledge on this.
That’s the background, now, what entails a “Level 3” type of investigation? Much of the work should already be done before this priority is established at the screening review.
Some places call them Prompt Human Performance Investigations, Human Performance Evaluations, or simply a Human Performance Checklist. By the time the reporting tool generated a “Condition Report” for a screener or screening team to prioritize, much of this information may have been gathered. This process ensures it will be gathered before too much time has passed. These investigations that are done up front usually by the supervision of the crew that discovered the problem, or “promptly” and most likely can already determine whether or not more investigation needs to be performed, and if not, they do not need an ACE (Level 2) or an RCI (Level 1).
The documentation/evidence to be included to help make decisions on further actions may include:
- Physical evidence related to the incident i.e. broken or damaged parts
- Photographs of observed failure
- Supporting documents, including laboratory test reports, sample analysis results, personnel interview notes, recorded measurements, and vendor data
- Logs, Records, Alarm printouts, data acquisition reports, etc.
- Telephone conversation notes between vendors, industry experts and the Root Cause Analysts
- A list of references from which key data is obtained and key assumptions are based
- Observations from involved personnel
- Previous history of the failed component
- Similar experiences at other places.
This evaluation checklist tool provides a structured assessment of the human performance error to ensure that the underlying cause of the human performance error is understood ideally from people directly involved ho know more about the issue than anyone else. The checklist asks the upfront important questions about what happened, who was involved, who can be contacted from management, and information that is known up to this point in the workflow. Nuclear screening procedures already have criterion to determine up front if it should be dealt with as a Level 3. (I recommend asking your nuke buddies to get this criterion as a starting point, as it may be vastly different from company to company). I have recently done an assessment for an energy company that screens everything as Apparent Cause, but call everything Root Cause, and they are all performed strictly by leadership team members. In other words, nothing is screened, and that’s a problem.
Note: Some places have a quarantine procedure to implement when following an event prior to the level of investigation being determined.
In nuclear, yes, I’ve seen lower level Human Performance Evaluations screened out as Level 3 and 4 because of lack of risk, consequence or impact. I have not come across that style of coding in other energy sectors at this point, mainly because their Human Performance Programs are either fledgling, or haven’t matured to the point of having a comprehensive reporting system and screening review, yet. You are asking the right questions to move a program ahead.
Create 4 Levels of condition reports and screen them out as long as there is a threshold of general acceptance for minimal effort (Level 3 and no effort for level 4) in a non-event you do not plan on learning much from, or it’s corrective actions were entirely self-revealing or already implemented and nothing more needs to be done. Don’t waste resources on investigation time that could be spent being proactive doing other work.
I love what Dr. William Corcoran said in his document entitled, “The 13 Steps of a World Class Corrective Action Program”: “Organizations that optimize their learning from experience will outperform those that do not. Those that do not are digging their own graves by making it easier for their competition.”
Thank for asking the question and keep them coming! Have an event free day!
This response was primarily brought to you by an amalgamation of old procedure documents from multiple companies I’ve worked with in the past, and DOE online resources.