Do you have to deal with complex, multi-faceted problems? Do you have to deal with conflicting stakeholders? Are you faced with seemingly intractable problems? Have yesterday’s solutions turned into today’s problems? If so then you may need to use Systems Thinking to help understand today’s increasingly complex world. By looking at the whole picture Systems Thinking allows us to solve problems with greater acuity, plan more effectively, and organise ourselves and our companies more efficiently.
Earlier this year I was privileged to be invited by the Chief Scientific Adviser to the Department for Transport to an internal seminar on Systems Thinking. Privileged not only as one of the few external invitees but also to hear some fascinating presentations on the work going on in this complex but vital area.
Professor Dave Cliff from the Department of Computer Science at the University of Bristol took us through the background. A national research and training initiative in the science and engineering of Large-Scale Complex IT Systems (LSCITS) was set up in 2007 with headline funding of £15m.
The programme is deemed to be necessary because at some point in the not too distant future the growing complexity and magnitude of large scale IT systems will exceed the management capability to deal with them.
The team is working on the intersections between complexity in organisations, socio-technical systems engineering, high integrity software engineering, predictable software systems, novel methods and the mathematical foundations for all of these.
But how do IT systems become large-scale and complex?
Firstly, most system development is “Brownfield” rather than “Greenfield”. Very few IT systems are commissioned as standalone “clean-sheet” engineering. Most systems have to slot into a network of interactions with existing systems. The constraints imposed by these existing systems often vastly outweigh the formal requirements of the system, and its non-functional requirements, combined. Thus the systems become “contaminated by complexity”.
David Collingridge in his book The Management of Scale, written for students of corporate strategy and production management, presents a set of case studies:
· USA space shuttle
· Nuclear power
· UK North Sea oil
· Large irrigation schemes in developing countries
· High-rise system building in the UK
All of these went massively over-budget, were badly delayed, and failed to deliver. Only North Sea oil exploration gave a positive RoI but through sheer luck. The common themes were technology changes that were too big, too rapid and came with a very long lead time. But too big, too rapid for what? For the large highly centralised organisations that managed them. This argues for “incrementalism”, i.e. that organisations need to decentralise; learn from experience and mistakes; be able to proceed in small steps; and be able to adaptively change tack.
Other authors have considered these costly failures before.
In Normal Accidents Charles Perrow looked at Apollo 13, Three Mile Island, and others. He showed that failings in various parts of the system are bound to interact in unpredictable ways. Failures rapidly run ahead of attempts that are made to limit their consequences. But this only goes so far as his analysis is largely descriptive and post-hoc. Problem technologies can be identified with the benefit of 20:20 hindsight, but we need predictive teaching.
In Challenger Launch Decision Diane Vaughan analysed events (mainly the last 24/48 hours) leading to the launch of Challenger. It was written after publication of the presidential commission’s accident investigation report. Her key claim concerns a social dynamic or process: Normalisation of Deviance.
William H. Starbuck & Moshe Farjoun edited Organization at the Limit: Lessons from Columbia Disaster. This had multiple authors, including Vaughan (who noted major similarities with Challenger). The Columbia Accident Investigation Board explicitly noted causal socio-technical factors. Later chapters start to talk about the need for Resilience Engineering.
“Resilience can essentially be thought of as the persistence of service delivery that can justifiably be trusted when facing changes… The notion of the delivery of service that can justifiably be trusted is often referred to as dependability, an emergent property of systems that encompasses availability, reliability, integrity, confidentiality, safety, and maintainability… So resilience can be defined as the persistence of dependability when facing changes.” Baxter, Rooksby, & Sommerville 2009
Failure need not be a total loss (“outage”). Outage costs per hour are an appealing & striking metric, but “… a system failure has occurred when a direct or indirect user of a system has to carry out extra work, in response to some system behaviour, over and above that normally required to carry out some task.” Baxter, Rooksby, & Sommerville 2009
Case studies have great value in the pre-theoretical stage, but we need predictive teaching for this to be proper engineering. Issues of designing for resilience, maintaining resilience, and coping with failure, are at the research frontier of this new field.
John Seddon has taken ideas from Japanese manufacturing and sought to apply them to UK public services. Influenced by Deming’s “Out of the Crisis” and Toyota management philosophy, his Freedom from Command & Control (2003) applies ideas from Japanese manufacturing engineering to service organisations. His later work Systems Thinking in the Public Sector (2008) is explicitly, unashamedly aimed at UK public sector service provision. He argues that UK public-sector services have massive structural flaws. These flaws stem from a hierarchical command-and-control target-driven mind-set. Compliance to externally-set specifications and targets creates failure conditions. This approach incurs five types of waste:
· Costs of writing specifications for standards and targets
· Costs of inspection
· Costs of preparing for inspection
· Costs of specifications being wrong
· Costs of demoralisation
Maybe we need new ways of architecting large-scale services, regardless of IT?
People, acting individually or in groups, play a vital role in both service delivery and service innovation, even when the service system is heavily dependent on technology infrastructure and/or highly automated. The field of socio-technical systems engineering (STSE) is in its relative infancy. In STSE, the social entities (individual humans, project teams, corporate divisions, or entire organisations) that interact with the technology system are rightly viewed as core components, inside the system boundary. As service systems grow in size, the need for achieving co-ordination of socio-technical processes and resources across and between organisations becomes critical: in some cases, dealing with the sheer scale of the system is a primary issue. In large and complex systems involving multiple organisations, issues of human behavioural psychology and organisational culture demand increased attention.
A system is complex if it is composed of interacting components that are (at some level of analysis) individually simple and that have simple interactions and yet where nonlinearities in the components and their interactions compound across the entire system in such a way that the overall system-level behaviour is difficult or impossible to predict (i.e. emergent) even given perfect knowledge of the components and their interactions.
Duncan Kemp, Lead Systems Engineer, Rail Systems at the DfT gave his perspective. There are four aspects of systems thinking:
· Critical thinking
· Multiple perspectives
· Systems theory
Critical thinking involves following a cycle of observation, hypothesis, experiment and analysis and conclusion avoiding confirmation bias, attribution bias, trusting testimonial evidence, filling in the memory gaps, accepting authority without question, generalising from too few observations, coincidence, ignorance and the failure to admit it.
Multiple perspectives – more than one of which can be true- is based upon Checkland's “Systems Thinking, Systems Practice” 1981. It involves applying standard systems thinking to human activity systems and helps to understand tacit differences in understanding of the system.
For example, is a railway system
1. A business that makes a return on investment for its shareholders,
2. A low carbon and low congestion transport system for the economy, or
3. An organisation to employ the brothers?
There is a useful word in German for such different perspectives, Weltanschauung, and communicating across such different Weltanschauung is extremely difficult. For example, the optimum capacity of the railway would be quite different for each Weltanschauung. A possible solution is in the interrelationships. Based upon Forrester’s work on system dynamics and Senge’s ‘the fifth discipline’ we can use simple mathematical models to explain apparently complex behaviour. This helps understand interrelationships between apparently unconnected issues.
Take the thorny question of transport safety. If investment goes into improvements in rail safety rail fatalities will fall. However, unless more investment goes into passenger comfort on the railways more people will travel by road so road fatalities will increase. The inner loop of rail safety dominates the behaviour of the system because it is simpler and because it works within the strong ‘rail silo’, while the outer transport safety loop is complex and poorly understood. This is exactly the same dynamic as drug addition, mega-project conspiracy of optimism and some system integration problems.
In summary Systems Thinking helps you understand the problem space. It helps you avoid solving the wrong problem – or part of the problem. It helps understand different perspectives and it makes the systemic problems more tractable. It also helps you identify the potential problems before they occur.
Copyright David C Pearson 2010 All rights reserved