Operational Resilience: Documenting Your Extreme Yet Plausible Scenarios in One Sentence
Published by Ben Saunders - OpRes Founder
Roughy a 2-minute read
“As an operational resilience SME, I want to view my workload distribution, so that I can understand concentration risks with cloud service providers across my important business services.”
If you are familiar with agile software delivery practices, then you will be used to the creation of user stories that identify the actors, actions, and outcomes that a software feature must be capable of delivering upon. For the last few months, the drafting of user stories and refining of our product backlog has been a constant at OpRes HQ. As a result, this got us thinking…Namely; “How can we apply agile software delivery practices to a detailed and thorough activity such as scenario testing, for the purposes of operational resilience?”
Subsequently, we took the idea of a user story and considered how we could apply the same concept to the identification of an extreme, yet plausible disruption to an important business service. The reality is that a single business service could experience any manner of disruptions. Whether these be through technology disruptions, malicious 3rd party attacks, financial or economic fluctuations. And even political or climate-driven interference. Indeed, the same scenario could be plausible across multiple important business services!
As part of your firm's identification of extreme, yet plausible scenarios you will no doubt have to perform some analysis and ask yourselves specific questions that may range from:
If this business service were disrupted, would it result in long-term damage to the firm, our customers, or the wider market?
Why does a specific disruption event result in a heightened level of concern to the firm?
If this disruption event were to occur, when would it result in the most damage to the firm, our customers, or the wider market?
Are there any single points of failure across our important business services? Whether these be technology, people, or facility-related?
Where there are single points of failure, how hard would it be to recover normal levels of service in a timely manner?
Do we have the required recovery strategies in place where single points of failure are prevalent across our important business services?
Once your firm has started to surface and document responses to these questions, now is the time to consolidate your findings and refine the scenario into a simple, one-sentence assertion. In the same way that a user story is underpinned by 3 components in the world of agile software development. We believe a similar concept can be applied in the scenario planning domain for operational resilience. By applying the following equation we believe firms can rapidly define and document their extreme yet plausible disruption scenarios:
[Cause X] Affecting [Activity/Resource X] Disrupting [Business Service X] Resulting in [Impact X]
Let’s bring this to life by providing a practical example….
“A database upgrade procedure (Cause) results in the corruption of customer account records (Resources) impacting our customer’s ability to view their current account balance and make payments out of their account (Important Business Services) for ten days, accompanied by a £5M penalty from the U.K.’s banking regulator (Impact and Customer Segment).”
Indeed, having a single sentence to underpin your operational resilience strategy is a simple and effective starting point. From here, your firm should then start to determine what existing investments have been made to protect against this disruption. Has your firm already established a sound recovery and ongoing testing plan to ensure it can cope based on this type of event? If so, you will need to evidence:
What are they?
Where are they stored?
Who owns them?
When were they last validated?
Do they still stand true based on your technology stacks, your people, and geographical dispersion?
What gaps presently exist across them and what is required to build confidence across the firm that your vulnerabilities have or are being addressed?
Ascertaining this initial baseline of data will be useful for tabletop exercises and war game scenarios to validate the suitability of your firm's response and recovery procedures. However, firms must ensure that they continue to identify ways in which they can decrease the frequency and severity of any such disruptions on their ability to operate in a business as usual manner. This means frequent and regular testing of their response and recovery procedures. Ongoing remediation of gaps that prevent speedy resolution of disruptions. As well as periodical reviews of their extreme but plausible scenarios and impact tolerances. s their business and the wider market changes over time.
Thanks for reading,
Ben