Presentation to CEPA Incident Forum 2017
View document in PDF [568 KB]
Member, Transportation Safety Board of Canada
28 September 2017
Check against delivery.
Slide 1: Title slide
Good morning. I am very pleased to have the opportunity to address members of the Canadian Energy Pipeline Association. I'd like to start by commending everyone here today for what you're doing: an initial, wide-open forum that will take a blunt, honest look at why things do and have gone wrong. This kind of environment —where operators can privately share what they've learned, including best practices resulting from accidents and near-misses—is fantastic. It's also something that unfortunately does not alwayshappen in other transportation modes.
So, congratulations. This forum is a tremendous opportunity to learn. And since it's always cheaper to learn from someone else's accident than it is from your own, I hope you all learn a lot.
The pipeline industry has achieved an enviable safety record. As you all know, there were no TSB reportable accidents in 2015 and 2016, and the number of incidents had also fallen to new lows. Achieving such a good safety record is hard work; keeping it there is even harder. In 2017 YTD, there has been an uptick in both accidents and incidents reported to the TSB. In fact, for the first time in three years, the TSB launched two new investigations into pipeline occurrences this year. In the three years I have been on the Board, I only worked on two pipeline reports. This year, I will get the opportunity to double my experience!
Slide 2: Outline
Most accidents and incidents, whether big or small, can be attributed to a breakdown in the way an organization identifies and mitigates hazards and manages risk. Why are some companies better at managing risk than others? The short answer is – it's about the safety culture of an organization. Today, I am going to provide my perspective on safety culture, its interaction with safety and integrity management processes, how our thinking about accident investigation has evolved, and where we would like to see industry headed in its journey toward zero incidents.
Slide 3: About the TSB
But first, a quick reminder of the TSB's mandate.
We advance transportation safety by conducting independent investigations into federally regulated modes of transportation. Our goal is to find out what happened, and why, so that hopefully, steps can be taken to prevent it happening again.
But: we are not a regulator, nor do we assign fault or determine civil or criminal liability.
Slide 4: A bit of history
Current regulatory "thinking" about organizational accidents and safety management systems started to be formed 40 years ago after catastrophic industrial accidents in the U.K. and Europe in the mid-1970s. And as you all know, the Piper Alpha oil rig disaster in 1988 set the stage for the "safety case," which has since been transformed into Safety Management Systems requirements by many regulators and standards-setting organizations in all modes of transportation in many countries.
Slide 5: NEB requirements for safety management
The formal requirements for safety management in the Canadian pipeline industry are meant to act as a framework to address systemic safety vulnerabilities before they result in an active failure, and to account for changing conditions that introduce new hazards and risks. The current requirements are mostly about having certain processes in place—for example, processes to identify and mitigate hazards, train and manage workers, monitor and evaluate progress, and continually improve performance. All of these processes are needed to manage risks, but they are not sufficient by themselves.
What is missing? Before I try to answer that, I want to provide a bit more background on safety management.
Slide 6: Three approaches to safety management (The person model)
There are three basic models that organizations use to deal with safety. The oldest of these, dating to the early 1900s, is the Person approach. You will all recognize the model – it has been commonly used to manage occupational safety. There is a well-known statistical relationship (1969 Frank Bird) between each of the layers on the pyramid (1:10:30:600). It is characterized by an emphasis on the person and on unsafe acts, and this model tends to result in blame, shame, retrain the individual who made the mistake, and then the organization writes another procedure to ensure that someone else does not make the same mistake. (An aside: this is really where the rail industry is still at.)
Slide7: The technical/engineering model
The next model is a technical or engineering approach, and dates from the 1940s. In this model, safety is engineered into the system. So there is an emphasis on process safety and reliability engineering. Humans are part of the system, and the man/machine interface is designed with the human in mind. This approach integrates human performance as part of the system and has resulted in organizations trying to understanding not only why humans make mistakes, but has produced processes such as threat-and-error management (TEM) and CRM training. The engineering model also lends itself to review through audits and assessments. (An aside: this is how I think of the pipeline industry.)
Slide 8: The organization model
The third model is basically an extension of the engineering model, but it encompasses the whole organization. This model had its formal genesis in the 1980s. I recently came across an article, published in 2002, which referred to SMS as "an emerging way to look at organizational safety." The first SMS regulations in Canada were actually issued in 2001—for the federally regulated railways.
In the organization model, errors are regarded as symptoms of latent conditions within the organization—conditions that stem from management decisions and system design—including changes that may have been introduced after earlier occurrences. This approach requires the use of proactive ways to actively identify and mitigate hazards to reduce the overall risk of the system. In this approach, an organization uses data to create leading indicators that point to potential problems; they listen to the "weak" signals of potential problems (i.e., connecting the dots) and then they take action. In this approach, safety decision-making is embedded in the organization—becoming, in the words of James Reason, "part of an organization's culture and the way people go about their work."
Slide 9: Safety, leadership, and culture
Now let's look at an organizational model to see how all the pieces fit together. In this model, the working interface is where all aspects of an organization, including its structure and decision-making processes, come together. A worker interacts with the processes, with the facilities, and with the equipment in order to get the job done. On the one hand, every organization must have safety-enabling processes—that is, processes that enable safe outcomes. These include processes to provide training and knowledge; processes to reduce exposure to workplace hazards; various policies, standards and operating procedures; and processes to recognize and mitigate hazards. A safety leader needs to understand these processes, how they are audited, and how effective they are in his or her organization.
On the other hand, organizations must also have processes that sustain the enabling systems—such as how people are selected and developed, how an organization is structured, how performance is managed, how decisions are made and so on. Just having the enabling systems is not enough. The organization must be capable of supporting and sustaining safe operations. For example, is safety given adequate emphasis through the structure of the organization? Does performance management adequately address the safety responsibilities of leadership? How are employees' mistakes handled? Are they treated as learning opportunities for the organization, or are the individuals punished? Are employees expected and encouraged to report close calls? The link between the sets of systems is the culture of the organization—the mostly unwritten rules of how things really work. This is where, despite the best enabling processes and sustaining systems, a workplace culture may negatively impact the workforce through mistrust, poor communication, or through lack of credibility in management. And finally, it is the leaders of an organization who drive both sides and who have the greatest impact on the culture.
Remember the question I posed earlier: What is missing? Here is the answer. Enabling systems are necessary, BUT by themselves are not sufficient to ensure good safety management. Why not? Because organizations must be capable of supporting and sustaining the enabling processes. Without the sustaining systems—including how decisions are made in an organization, how change is managed, how employees are hired and promoted, how they are supervised, how they are recognized and rewarded—the enabling processes cannot be optimally implemented.
And as I said a moment ago, overarching the whole structure is leadership. Leaders' beliefs, priorities, decisions—and, above all, how they behave—sets the culture, including the safety culture.
Slide 10: Accident investigation: an evolution
Just as thinking about safety management has evolved, so too has the work of accident investigation, and how we think about accident causation.
It used to be, for instance, that the focus of accident investigation was on mechanical breakdowns. Then, as technology improved, investigators started looking at the contribution of behaviour and human-performance limitations. Nonetheless, people still thought things were "safe" so long as everyone followed standard operating procedures. Don't break any rules or regulations went the thinking; make sure the equipment isn't going to fail; and above all: pay attention to what you're doing and don't make any "stupid" mistakes.
That line of thinking held for quite a while. In fact, even today, in the immediate aftermath of an accident, people on the street and in the media still think accidents begin and end with the thing that broke or the person involved. So they ask if an accident was caused by "mechanical failure" or "human error." Or they jump to false conclusions and say, "Oh, this was caused by someone who did not follow the rules," as if that were the end of it. Case closed.
But it's not that simple. No accident is ever caused by one person, one factor. And no one wakes up in the morning and says, "I think I'll have an accident today"—not ships' masters, not pilots, not locomotive engineers, not pipeline control room operators, and not company executives.
That's why our thinking had to evolve. And so it has become critical to look deeper into an accident, to understand why people make the decisions they make at all steps of their work. Because if those decisions and those actions made sense to the people involved at the time, they could also make sense to others, in future. In other words, if we focus too much on "maintenance error," for instance, we miss out on understanding the context in which maintenance staff were operating.
Slide 11: Pipeline findings
When I now look back on TSB findings for pipeline accidents, it is interesting to note that most of causal and contributing factors for pipeline accidents were for things breaking. Perhaps we did not dig as deeply 20 years ago as we would today, into the "why". Very few of these "causes" refer to role of the people in the system and their decisions.
But in rail—which is my background—we have found that the actions of the people involved are often a causal factor.
For instance, yes a rail may break, causing an accident. But we find that there was often a human element involved, too. Such as an employee who might not have spotted a pre-existing crack because he or she hadn't been trained to do the test that would have spotted it. Is that accident just a simple result of mechanical failure? Of course not.
So, my point is that, even in an industry/system as highly … engineered as pipelines, the human element is still very important in improving safety.
And so, the TSB now looks at organizational factors, systems issues, and … safety culture.
Slide 12: What is safety culture?
Safety culture has been defined in various ways, but how it is measured—how it is visible to others—is through the behaviours of individuals in an organization. You can't regulate safety culture. You can't buy it. You can't insist on it. It has to come from within an organization, and the tone is set at the top. That means the leaders: their beliefs, their priorities and above all how they behave. Because leaders' behaviour is emulated by others in an organization. What leaders don't say and don't do can be as important as what they say and do. These behaviours create the culture – the "way work actually gets done". This is sometimes expressed as "How things work around here," or "what people do when no one is looking."
Culture is embedded deep in an organization and takes a long time to change. Just changing a policy, for example, isn't enough. It is absolutely a first step, but it takes years of leaders demonstrating that they are following a new policy to get workers to believe there has been a real change. Once workers believe what management is saying, the culture will change.
Slide 13: Competing "top priorities"
Many organizations, of course, say that safety is their "top priority." However, there is plenty of convincing evidence that, for many of them, the real priority is profitability. That's not to say they consciously choose to be reckless or deliberately unsafe. It's just that, in the real world, they often have to balance many competing factors: safety, customer service, productivity, technological innovation, scheduling, cost-effectiveness, and return on shareholder investment.
Safety, however, should not be a "priority," it should be a value. That is, it's something you have to build into your organization … or you won't stay in business. It must be part of the consideration of everything that you do.
CEPA has made progress in dealing with the safety culture journey of its members, and I hope the soon-to-be-released Safety Culture Guidance document will be a valuable tool for all of you.
Slide 14: Assessing the outcomes
So, what does a good safety culture look like to the TSB, and how is it measured?
The answer is straightforward: It's measured by the behaviour of individuals in an organization.
Here are four examples of that behavior—drawn mostly from the other three modes of transport we investigate.
Doing what you say you'll do
It starts with doing what you say you will do, for example, with conducting risk assessments.
After all, performing a risk assessment is a way to ensure that any operational change that is being undertaken is done safely. And when a risk assessment finds unmitigated hazards or risks, a company can them take steps to address these, even if doing so costs money.
Yet at the TSB we still see examples where some companies aren't conducting risk assessments before making operational changes. For instance, in the rail mode, when shipments of dangerous goods saw a staggering increase over a period of a few years, risk assessments were not conducted to assess the impact of the increase in traffic on track infrastructure. We also see an attitude of trying to justify why a risk assessment was not done, that is, justifying why it was "not required by a regulation," rather than welcoming the information gleaned from doing one.
A second key element of a good safety culture is that it enshrines its SMS processes within a just culture. What's a just culture? It's an environment that draws a clear distinction between simple human mistakes and unacceptable behavior; one that does not immediately blame the worker, but seeks first to find systemic contributing factors.
At some companies, for instance, when something bad happens, there can be a tendency to blame and discipline the worker first, instead of looking at bigger issues like the workability of procedures, or training, or fatigue, or supervision.
Now, in order to know when you're looking at systemic issues, you just have to use the "substitution test." By that I mean ask yourself the following: could this have happened to someone else in the same circumstances?
In other words, look beyond the individual worker; instead, look at the system in which he or she is operating.
NAV CANADA, for instance, in discussion with its controller union, developed a "Just Culture" decision tree so that both parties could see the rationale for imposing discipline in those rare cases where it was warranted, instead of just punishing someone for "not following procedures."
A third element of a good safety culture is a Reporting Culture—that is, one where people feel "safe" to report issues and incidents—because they know they will be treated fairly, and they know that the underlying issues will be addressed. In companies without a strong reporting culture—where, for example, an employee fears retribution for speaking out—then he or she will be less likely to report problems. The result? Those problems will remain unknown and unaddressed by the company.
The final element of a good safety culture is one that is also a learning culture.
Ideally, this is the natural outcome of an organization where employees trust that they can report problems. Because, once reported, management—and ideally the whole organization—asks, "what can be learned from this? In other words, how do you use the data that's been reported in order for the company to learn and to grow?
Slide 15: Evaluating safety culture in the future: the conundrum of how far to go
Given that SMS regulations in Canada in all modes are still relatively new the way they are being evaluated hasn't fully … matured. For example, regulators are still focused primarily on reviewing, assessing, and monitoring the enabling processes required by the various regulations. This goes back to what I said earlier, about asking whether an SMS exists. As opposed to asking: is your approach to managing safety actually working?
That's a tougher question, one that's even more challenging to evaluate. Why? Because the process is much more subjective, and for the most part, regulators have not created tools to evaluate them.
For example, how does one assess that safety is given enough emphasis through the structure and staffing of an organization? What staffing levels are appropriate? How do you know if, for example, there are too many vacant positions at one time? What about experience and competence? How do you measure whether performance management adequately addresses the safety responsibilities of leadership? How do you know if employee mistakes are treated as learning opportunities?
Thus, if the regulator does it at all, the evaluation of the "sustaining" side of an organization is more often a qualitative exercise—a matter of opinion more than one of having precise evidence.
However, if a regulator stops its evaluation after reviewing the enabling processes and the sustaining systems, they still won't know "why" the deficiencies exist. The "why" is about the safety culture.
The conundrum now faced by the regulator is this: having created requirements for safety management processes—the success of which are dependent on systems such as organizational structure and staffing, which are the purview of management, which in turn are dependent upon the safety culture of an organization, which stems from the behavior of an organization's leaders—how far does the regulator go to assess an organization's ability to manage safety BEFORE an accident?
With respect to pipelines… When the NEB evaluates the safety management programs required by its regulations, it tries to examine the adequacy and effectiveness of the management processes and procedures employed. In your view, is their approach working? If not, what should be done differently?
Once an accident has occurred, the TSB's role is to investigate, and the investigation process for a significant occurrence will include a review of the organization and its ability to operate safely, and a review of regulatory oversight. The TSB maintains a Watchlist to shine a light on those issues that pose the greatest risk to Canada's transportation system. Safety management and oversight for air, rail and marine is on the 2016 Watchlist.
Slide 16: Words to consider
I want to leave you with a final thought, a quote that I know you have all seen, but still rings true today. In 2013, on the 25th anniversary of the Piper Alpha accident, Lord Cullen said:
"No amount of regulations for safety management can make up for deficiencies in the way in which safety is actually managed. The quality of safety management…depends critically, in my view, on effective safety leadership at all levels and the commitment of the whole work place to give priority to safety."
Slide 17: Contact information
Slide 18: Questions
Slide 19: Canada wordmark