Skip to main content

Advertisement

Advertisement

Are MRT disruptions “normal accidents”?

Singapore’s Mass Rapid Transit (MRT) system has suffered disruptions of increasing frequency and severity in recent years. The latest one occurred on Oct 7 on the North South line, caused by the dramatic flooding of the tunnel between the Bishan and Toa Payoh stations.

SMRT has just announced it will hire another 200 engineers by 2020, having already tripled the number of engineers since 2012, but the author says a complex problem is not solved by simply throwing numbers at it.  TODAY file photo

SMRT has just announced it will hire another 200 engineers by 2020, having already tripled the number of engineers since 2012, but the author says a complex problem is not solved by simply throwing numbers at it. TODAY file photo

Follow TODAY on WhatsApp
Follow TODAY on WhatsApp

Singapore’s Mass Rapid Transit (MRT) system has suffered disruptions of increasing frequency and severity in recent years. The latest one occurred on Oct 7 on the North South line, caused by the dramatic flooding of the tunnel between the Bishan and Toa Payoh stations.

Are such disruptions the new normal for the MRT system? Or as the sociologist Charles Perrow might ask, are they “normal accidents”?

Prof Perrow coined the term “normal accident” in his study of the 1979 Three Mile Island Accident, in which cascading and interlocking technical and human failures resulted in a nuclear reactor meltdown. He argued that normal accidents are both inevitable and inherent in systems that exhibit “tight coupling” and “catastrophic potential”.

The elements of a tightly-coupled system – both technological and organisational – not only interact with each other but are interdependent. In fact, the different parts of the system are so enmeshed together that it is usually necessary to shut down the entire system in order to fix errors or even to maintain it.

And because of tight coupling, small failures in different parts of the system quickly and unpredictably combine to create a major failure that cascades throughout the system.

Does the MRT system exhibit such characteristics that make it prone to normal accidents? A train system is not usually the first example that comes to mind when we talk about complex systems.

The underlying technology is mechanical and is governed by the laws of linear, Newtonian physics. And the overarching organisation is a top-down hierarchy that is focused on command and control, and operates on the principle of a clearly-defined division of labour. Just think of every boxes-and-lines organisational chart you have ever seen.

Beyond a certain threshold, though, a linear and predictable engineering system can tip into a non-linear and unpredictable complex one.

For a public train system, it happens when the number of components of the system increases: from a basic configuration of North South and East West lines in 1987 to one markedly different today in terms of new lines and interconnections.

Also, the sheer spike in passenger volume over the years has pushed the MRT system to operating at capacity, and in the process eroded SMRT’s ability to effectively monitor and govern the system.

If the MRT system is a tightly-coupled one prone to failures, then the rationale of its management must change. For a start, it calls into question the meaningfulness of the apologies from the Minister for Transport, the SMRT Chairman, the SMRT Group CEO and the SMRT Trains CEO after the flooding debacle.

Other than a gesture aimed at assuaging the public’s anger, is it reasonable to expect the senior management of a complex organisation to be responsible for the minutia of its operations? In a complex system, can the buck actually stop anywhere?

And if you accept the logic that there is no single cause of failure in a tightly-coupled system, then it is equally meaningless (even if convenient) to blame the maintenance crew for the failure of the water pumping system that led to the flooded tunnel.

By the same token, if one cannot (solely) blame the senior management for normal accidents, then when things go well neither should the top brass be so quick to claim undeserved credit.

In The Black Swan, Nassim Nicholas Taleb points out the asymmetry in how experts and managers interpret their roles in random events: when things go right, they put it down their knowledge and skill, but when things go badly it was always the situation and bad luck that were to blame. What is sauce for the gander must also be sauce for the goose.

BUILDING SOME FAT

Most importantly, the growing complexity of the MRT system renders pre-existing standard operating procedures useless. Micro failures, even the familiar ones, start to combine in unexpected ways so that the resulting major disruptions do not happen the same way twice.

The default is no longer “normal” operations with predictable deviations; rather, it is a perpetual onslaught of novel uncertainties and chronic surprise that keeps the managers and the rank-and-file in constant troubleshooting mode.

The routines that serve as a respite from the stresses of having to improvise solutions to continuously mutating problems begin to unravel. Hence, operations become maintenance, and maintenance becomes operations.

For most of its existence, our MRT system was known both for its leanness and reliability. With growing complexity and increased usage, its leanness has paradoxically created unreliability.

In James Gleick’s classic book Chaos: Making A New Science, he recounted the engineer’s perennial challenge of reducing noise in the telephone wires used to transmit information between computers. Engineers found that no matter how strong they made the signal, there would be some residual noise that could never be eliminated.

Eventually, the engineers settled for a modest signal, accepted the inevitability of errors, and used a strategy of redundancy to compensate for them. In other words, they used more transmitters. Inelegant and crude, and far from optimal, but ultimately more effective than the elusive search for the error-free and efficient signal.

The key lesson for the MRT system is to adopt a similar strategy of redundancy and to build some “fat” back into the system in terms of duplications, back-ups, and especially loose coupling so that failures are quarantined and do not transmit throughout the system.

A tightly-coupled system is high-performing because the moving parts are integrated, but it requires keen coordination and is fragile.

Making the system more loosely coupled will make it more resilient, but at the cost of seamlessness and efficiency. Finally, we need to ask if organisational structures and processes are appropriate to dealing with complexity.

It takes complexity to deal with complexity; traditional hierarchies and centralised command systems do not just fail to keep up with normal accidents, but contribute to them.

SMRT Corporation, the operator at the heart of “flood-gate”, has just announced it will hire another 200 engineers by 2020, having already tripled the number of engineers since 2012.

But a complex problem is not solved by simply throwing numbers at it. What matters more is whether the engineers and other resources are organised for adaptiveness and troubleshooting in a complex and non-linear world.

ABOUT THE AUTHOR:

Adrian W J Kuah is Senior Research Fellow at the Lee Kuan Yew School of Public Policy.

 

Read more of the latest in

Advertisement

Advertisement

Stay in the know. Anytime. Anywhere.

Subscribe to get daily news updates, insights and must reads delivered straight to your inbox.

By clicking subscribe, I agree for my personal data to be used to send me TODAY newsletters, promotional offers and for research and analysis.