ISASI keynote 2016
Chair, Transportation Safety Board of Canada
20 October 2016
View document in PDF [623 KB]
Check against delivery.
Thank you very much for that kind introduction and for the invitation to speak to you today. It is a real pleasure to be here.
This is my first time attending ISASI and I've been very impressed with the quality of presentations this week. There's clearly a lot of knowledge and experience in this room. I think we can be very proud of the valuable work performed by air accident investigators. We've become very good at understanding how accidents unfold, how one event links to another, and how—to invoke the theme of this conference—every link is important.
We've become especially good at identifying the operational, technical and human factors that contribute to accidents. Whereas in “the old days” we sought only to explain what happened, today we also focus on why.
For instance, it has now become standard to look at how people interact with automated systems and with one another … or whether a pilot or crew may have been task-saturated. Or fatigued. Or distracted. Because all of these elements can play a role.
But in order to keep improving, we need to continue to push the boundaries—of our knowledge, and of what we investigate—so that ultimately we can do a better job of advancing safety. And so today I'd like to highlight something else—two somethings—that we should be looking at.
Specifically, I'm talking about the organizational factors that contribute to accidents, and the regulatory environment in which those organizations operate.
Let me elaborate.
Many companies, for example, say that safety is their top priority. However, there is plenty of convincing evidence that, for many of them, the real priority is profitability. That's not to say they consciously choose to be reckless or deliberately unsafe. It's just that, in the real world, they often have to balance many competing factors: safety, customer service, productivity, technological innovation, scheduling, cost-effectiveness and return on shareholder investment.
That's a challenge for any business, and even though companies generally recognize and accept that products and services must be “safe” if they want to remain in business … those other priorities can exert a lot of pressure on management.
And when that happens, when we find deficiencies in how organizations identify, prioritize and manage their risks, we must ask ourselves: where was the regulator?
Because, ultimately, it is the regulator that is the guardian of public safety. Because the regulator sets the rules, determines the playing field, and creates the framework under which air carriers operate. And then—ideally—it is the regulator which provides the balanced oversight—whether in the form of inspections or audits—to make sure organizations are abiding by those rules and by that framework.
In Canada, SMS, or safety management systems, have been mandatory for many large carriers for over a decade. It is also recommended by ICAO. It is a proven, internationally recognized tool to help companies identify and manage risk. In other words, to help them find trouble before trouble finds them. SMS however, at least in Canada, has (for the most part) been mandated only for larger, scheduled carriers.
And although their adoption of, and transition, to, SMS has not been without some bumps along the road, many of these companies have done a good job, and their SMS—again, for the most part—is fairly mature and robust.
Many smaller operators, however, have faced a different reality. Some are not required to implement SMS at all. Others have done so only because they had to. And still more find themselves somewhere in the middle: trying to implement it, but without the experience, expertise, or resources to do it as well as necessary. It's a continuum, then—a spectrum. And so today I'd like to look at several examples along that spectrum, a trio of recent TSB investigations that highlight where companies, for various reasons, may find themselves. And in each case I'll also look at what the regulator either did—or still needs to do—to improve matters.
On March 13, 2011, a Boeing 737 was departing Toronto's Lester B. Pearson International Airport with 189 passengers and a crew of 7. During the early-morning take-off run, at about 90 knots indicated airspeed, the autothrottle disengaged after take-off thrust was set. As the aircraft approached the critical engine failure recognition speed, the first officer, who was the pilot flying, noticed an AIRSPEED DISAGREE alert and transferred control of the aircraft to the captain, who then continued the take-off.
During the initial climb, at about 400 feet above ground, the aircraft received a stall warning (stick shaker), followed by a flight director command to pitch to a 5° nose-down attitude. The take-off was being conducted in visual conditions, allowing the captain to determine that the flight director commands were erroneous. And so the captain ignored the flight director commands and maintained a climbing attitude. The crew advised the air traffic controller of a technical problem that required a safe return to Toronto.
Now, some may consider this as “no big deal,” just something that occasionally happens, in this case, due to a failure in the pitot-static system. Yes, it resulted in inaccurate airspeed indications, stall warnings, and misleading commands being displayed on the aircraft flight instruments. Still, the pilots handled it effectively, and nothing serious came of it. Moreover, there was no damage to the aircraft, nor were there any injuries to those onboard.
But what if the takeoff had been during IMC conditions, when the captain could not have so easily determined which airspeed indicator was unreliable?
Let's look at this from an SMS perspective, one that is supposed to have proactive processes to identify and mitigate hazards, and reactive processes to learn safety lessons from incidents.
Approximately 7 months prior to the occurrence,Footnote 1 Boeing issued an advisory to 737NG operators regarding flight crew and airplane system recognition of, and response to, erroneous main display airspeed situations.
The advisory, in somewhat dry language, said that an erroneous airspeed event might compromise a flight's safety, and described the issue as follows: “The rate of occurrence for multi-channel unreliable airspeed events combined with probability of flight crew inability to recognize and/or respond appropriately in a timely manner is not sufficient to ensure that loss of continued safe flight and landing is extremely improbable.”
So. Despite the manufacturer's warning that erroneous airspeed events were occurring more frequently than predicted—and that the flight crew training curriculum did not require recurring training for this—the operator didn't consider this a hazard that needed to be analyzed proactively by its SMS. The information, therefore, was not circulated to flight crews, nor did the operator consider taking any other action.
But what about after the occurrence? There, too, the operator's SMS didn't really “kick in”—because the occurrence was not recognized at the time as being sufficiently serious in nature to warrant calling in company safety personnel. Nor was it recognized as an occurrence that had to be reported to the TSB. Therefore, no immediate action was taken that would assist in an investigation—such as the preservation of flight data and cockpit voice recordings. In short, the effective performance of the crew masked the underlying risks.
Therein lies the problem, and we said so in our investigation findings: When neither an operator's proactive nor reactive SMS processes trigger a risk assessment, there is an increased risk that hazards will not be mitigated.
Which brings me to the question of oversight.
Because, as more and more operators transition to safety management systems, the regulator must recognize that those operators may not always identify and mitigate hazards as they should. The regulator—in this case, Transport Canada, but it could just as easily happen in another country—must adjust its oversight activities to be commensurate with the maturity of an operator's SMS.
I'd now like to look at a second example, from a different part of the spectrum.
On 19 August 2013, a Douglas DC-3C was operating as a scheduled passenger flight from Yellowknife, Northwest Territories, to Hay River, Northwest Territories. Just after lift-off, there was a fire in the right engine. The crew performed an emergency engine shutdown and, unable to climb, made a low-altitude right turn back towards the airport. The aircraft struck a stand of trees southwest of the threshold of Runway 10 and touched down south of the runway with the landing gear retracted. An aircraft evacuation was accomplished and there were no injuries to the 3 crew members or the 21 passengers.
Our investigation found that the fire was precipitated by a pre-existing fatigue crack in the right engine number 1 cylinder. But what made this investigation unique—aside from the fact that the operator had a crew that did a great job keeping the aircraft flying—was what we learned about the company's behavior prior to the occurrence:
Not only had this aircraft exceeded its maximum certified take-off weight, but this was a frequent company practice.
In fact, this practice had come to be accepted within the organization, so much so that it had become normal. Pilots didn't complete the required weight and balance calculations, nor did management press them to do so. Our report was blunt, citing the lack of a top-down safety culture.
So. When the practice of adjusting weight and balance calculations to maintain them within limits after departure is not just well-known, but also accepted by senior management, what does this mean for a company's SMS?
Put it this way: the existence of an SMS on paper—in binders on a shelf, for instance—is not the same as a robust, mature system to manage risks. For that, you need leadership. And if senior management doesn't support the SMS—that is, if it only exists on paper without a sincere commitment by the organization—then safety deficiencies can, and likely will, go unreported.
In a case like that, where there's a lack of buy-in from senior management, then the only defense is … once again … the regulator.
But in this case, we found that the regulator—Transport Canada, though it could have happened elsewhere—had an approach to oversight that focused on an operator's SMS processes almost to the exclusion of verifying compliance with the regulations. Even though the company was known to have had difficulties maintaining compliance. Even though the regulator had seen multiple examples of this.
And that, simply, is not good enough. Not when there is a broad spectrum of safety cultures out there, a whole range of companies in different stages of implementing SMS or with different abilities or levels of commitment toward it.
Our report said as much, too, pointing out that not only was the operator's SMS ineffective at identifying and correcting unsafe operating practices, but that the regulator's surveillance activities did not identify them either—and that “consequently, these unsafe practices persisted.”
I'd now like to look at a third and final example, which also falls elsewhere along this spectrum.
On May 31, 2013, shortly after midnight, a Sikorsky S-76 air ambulance helicopter crashed just after taking off in Northern Ontario, killing all four on board: the captain, first officer and two paramedics.
The cockpit voice recorder quickly told us what happened: After takeoff, as the crew turned toward their destination, they were turning into an area of total darkness, devoid of any ambient or cultural lighting—no town, no moon, no stars. With no way to maintain visual reference to the surface, they would have had to transition to flying by instruments. And although both pilots were qualified according to the regulations, they lacked the necessary night- and instrument-flying proficiency to safely conduct this flight. The result was that, as the crew began carrying out post-takeoff checks, the aircraft's angle of bank increased, and an inadvertent descent developed, one that the crew recognized too late—and at an altitude from which it was impossible to recover—before the helicopter struck the ground.
Our investigation found that the pilots were not adequately prepared to fly in the conditions they encountered that night, though our official list of findings showed causes and contributing factors that went well beyond the actions of the crew. These included deficiencies in company training. And in management staffing. And in supervision. As well as multiple deviations from company SOPs. On top of all this, we found that the regulator was aware of all these issues—that they had been for some time—and yet their “collaborative approach” to surveillance activities with a willing operator was ineffective at bringing the operator into compliance.
When looked at from an SMS perspective, this accident asks a simple yet confounding question:
When the regulator has significant concerns about an operator, as was the case here, when—and how—should it intervene? In other words, when is enough, enough?
In this case, the operator wanted to do the right thing. They weren't required to have an SMS, but they had seen its benefits and were trying to implement one nonetheless. They were willing, and they were trying. But they were also under-resourced, and time and again their efforts to identify and fix their own problems came up short. The regulator knew this. The inspectors knew this. But internally the inspectors felt that they had few options to enforce compliance, and so the deficiencies persisted. For months. Right up until the accident.
Now let's look at things from the regulator's point of view.
Over the past decade, their approach to oversight in Canada has moved away from a traditional “inspect-and-fix approach.” The new, preferred model is a systems-level approach whereby, in addition to verifying a company's compliance with regulations, its internal processes are examined to verify that there is also an effective system in place to proactively manage the risks associated with its operations. The theory being that, if this is done properly, such a transition should result in improved safety—addressing not only any identified problems, but also the reasons behind them.
And I agree. In theory.
But that theory, and the move away from the traditional inspect-and-fix approach, only works if all companies have
a) the ability to proactively identify safety deficiencies,
b) the capability to rectify them, and
c) a top-down, organization wide commitment to doing so.
Does that sound like any operators you know? Sure. Some. Hopefully a lot. But not all. Because, again, there's a broad spectrum of capability, competence and commitment when it comes to implementing SMS—despite its track record of success.
Which brings me back to my question: when is enough enough?
Should the regulator wait for an accident before stepping in? Should it wait for operators to fix the problems that have been identified, or should it adopt a firm hand before then? If so, at what point?
Those are good questions—simple ones, yet perhaps confounding too. But they need to be asked, and they need to be discussed. If they're not, we lose out on an excellent opportunity to do what I said at the beginning of this speech—to push the boundaries of what we know and of what we investigate.
And this really is such an opportunity. Because the organizational factors I mentioned today, and the issue of oversight by the regulator, are appearing in our work with more and more frequency. They are, we have come to learn, important links in the chain—links that we need to pursue in order to better understand why accidents happen, and what needs to be done to make our skies, already very safe thanks to your good work, even safer.
Let's not waste this opportunity.