Business Continuity Demystified – The Power of Business Continuity
Key takeaways 1. Understanding Disruptions: Recognise that disruptions can be both anticipated and unanticipated. Anticipated disruptions are those you can foresee based on historical data or trends, while unanticipated ones are sudden and beyond your control, such as natural disasters or cyberattacks. 2. Maximum Acceptable Outage (MTO): Determine the maximum amount of downtime your organisation can tolerate before it becomes irreparable. This varies depending on your industry, criticality of services, customer expectations, and regulatory requirements. Balancing this time frame is crucial. 3. Planning for Negative Impacts: Business continuity planning involves preparing for the outcome of negative impacts caused by disruptions. Your plan should guide your organisation’s response and recovery strategies to maintain essential services during and after a disruptive event. 4. All-Hazards Approach: Embrace an all-hazards approach to business continuity planning. This means preparing for a wide range of potential disruptions, whether they are natural disasters, IT failures, supply chain issues, or other unforeseen events. This approach ensures comprehensive preparedness and resource efficiency. 5. Risk Appetite and Tolerance: Align your business continuity planning with your organisation’s risk appetite and risk tolerance. This involves setting expectations for how much disruption is acceptable during adverse events. Striking a balance between risk and resilience is critical for informed decision-making. |
Business continuity is a critical aspect of modern organisations’ resilience. By understanding disruptions, acceptable timeframes, predefined capacity, and adopting an all-hazards approach to business continuity planning, organisations can better protect their operations and ensure continuity, irrespective of the type of disruption encountered.
ISO 22301:2020 Security and resilience – Business continuity management systems – Requirements provides a concise definition, stating that business continuity is the “capability of an organisation to continue the delivery of products and services within acceptable timeframes at predefined capacity during a disruption” (emphasis added).
This will be our starting point to demystify and simplify business continuity for practical implementation.
What is a disruption?
ISO 22301 defines disruption as “an incident, whether anticipated or unanticipated, that causes an unplanned, negative deviation from the expected delivery of products and services according to an organisation’s objectives” (emphasis added). That might seem paradoxical at first glance, but it is designed to encompass a wide range of situations and emphasise the need for comprehensive business continuity planning.
Let’s start by understanding why disruptions can be both anticipated and unanticipated.
Anticipated disruptions
Anticipated disruptions are events or incidents that an organisation can reasonably foresee or predict based on historical data, industry trends, or known vulnerabilities.
These disruptions may include things like:
- Seasonal fluctuations in demand.
- Routine maintenance downtime.
- Expected IT system upgrades or migrations.
- Planned facility relocations.
- Changes in regulations that are known in advance.
While these disruptions are anticipated, they can still cause negative deviations from expected delivery if they are not planned for.
For example, if an organisation doesn’t adequately prepare for a planned IT system upgrade, it might experience downtime and service interruptions that negatively impact its operations.
Unanticipated disruptions
Unanticipated disruptions are events or incidents that could not have been reasonably predicted or expected. They are often sudden, and unexpected, and may result from factors such as:
- Natural disasters (earthquakes, hurricanes, floods).
- Technological failures (server crashes, data breaches).
- Cyberattacks.
- Supply chain disruptions due to unforeseen events (e.g., a supplier’s factory burning down).
These types of disruptions are typically beyond an organisation’s control. They may have severe and immediate impacts on its ability to deliver products and services according to its objectives.
The key takeaway is that disruptions, whether anticipated or unanticipated, can have a negative impact on an organisation’s operations.
Business continuity planning, as encouraged by ISO 22301, is about preparing for, and mitigating the negative impacts of disruptions, regardless of whether they were expected or unexpected. It underscores the need for organisations to be prepared for all types of disruptions, including those they can anticipate and those they cannot, to maintain continuity in the delivery of products and services according to their objectives.
In the case of anticipated disruptions, the emphasis is on effective planning and mitigation to minimise the negative impact. For unanticipated disruptions, the emphasis is on readiness and the ability to respond swiftly and effectively to minimise the consequences and deviations from expected delivery.
Maximum acceptable outage
The term “within acceptable timeframes” or acceptable downtime in the context of business continuity, as defined by ISO 22301, refers to the maximum amount of time that an organisation can tolerate a disruption or downtime before it becomes irreparable.
Also known as the Maximum Acceptable Outage (MTO), this time is a critical part of business continuity planning because it helps determine the criticality of business functions and activities and the resources required for recovery.
The acceptable timeframe can vary greatly depending on the specific organisation and the nature of its operations within the context of the “delivery of products and services according to an organisation’s objectives”
For example, for some businesses, an “acceptable timeframe” might be a few hours, while for others it could be a few days or even longer.
This concept of acceptable downtime is somewhat flexible and can vary from one organisation to another, as it depends on several factors:
- Criticality of products/services – The acceptable downtime will vary depending on how critical the products or services are to the organisation and its stakeholders, and towards the achievement of organisational strategic objectives. Highly critical services may need to be restored within minutes or hours, while less critical ones might have a longer “acceptable timeframe”.
- Customer and stakeholder expectations – The expectations of your customers and stakeholders play a significant role in determining what is considered an “acceptable timeframe”. Some industries and businesses have specific service level agreements that define acceptable downtime.
- Regulatory and legal requirements – Certain industries, such as healthcare or finance, may be subject to strict regulatory requirements that dictate the acceptable downtime. Compliance with these regulations is essential.
- Market competition – In highly competitive markets, the acceptable outage may be shorter to ensure that customers don’t switch to competitors due to prolonged downtime.
- Risk assessment – The organisation’s risk assessment and business impact analysis should help determine what downtimes are acceptable. It considers factors like financial losses, reputation damage, and operational impacts.
- Resource availability – The availability of resources, including employees, technology, and infrastructure, can affect the speed at which recovery is possible. The organisation’s readiness and maturity to mobilise these resources is critical.
- The complexity of operations – Complex operations may take longer to recover and restore, especially if they involve intricate processes, multiple dependencies, and specialised equipment.
- Communication and reputation management – For some organisations, the acceptable downtime may also include the ability to effectively communicate with stakeholders and manage their reputation during a disruption.
- Cost considerations – Faster recovery and resumption often require more resources, which can be costly. Balancing the cost of recovery against the acceptable timeframe is a key consideration.
It’s important to note that these timeframes should be established in advance as part of the organisation’s business continuity plan, and they should be based on a thorough risk assessment and business impact analysis. This ensures that the organisation is prepared to respond effectively to disruption and can continue to deliver products and services at a predefined capacity.
Organisations should aim to strike a balance between preparedness and resource efficiency to ensure they can recover swiftly and effectively from disruptions without compromising their long-term viability.
A prolonged period of maximum acceptable outage
While it’s essential to have realistic and flexible plans that account for various scenarios, planning for a prolonged maximum acceptable outage can be unwise due to the operational, financial, and reputational risks associated with extended downtime.
Planning for a prolonged maximum acceptable outage (MAO) can be unwise for several reasons:
- Operational risk – Prolonged outages can lead to severe operational disruptions. If your organisation plans for extended downtime, it may struggle to maintain essential operations, causing significant financial losses and potential harm to your reputation.
- Customer satisfaction – In many industries, customers have high expectations for service availability. Planning for prolonged outages may result in customer dissatisfaction and the loss of clients to competitors who can maintain better service levels.
- Compliance issues – Regulatory bodies often require organisations to meet specific recovery time objectives (RTOs) and service level agreements (SLAs). Planning for extended outages may lead to non-compliance with these requirements, resulting in legal and financial penalties.
- Resource allocation – Preparing for a prolonged MAO may require allocating excessive resources to business continuity and disaster recovery efforts. This allocation can be costly and may divert resources from other critical business activities.
- Competitive disadvantage – In highly competitive markets, businesses that can recover quickly from disruptions gain a competitive advantage. Planning for prolonged outages can put your organisation at a disadvantage, as competitors may be better positioned to maintain operations and capture market share.
- Recovery complexity – The longer an outage lasts, the more complex the recovery process becomes. Planning for a prolonged MAO can lead to overly complex recovery strategies that are difficult to execute effectively.
- Employee morale – Prolonged outages can have a negative impact on employee morale and motivation. Employees may become frustrated and disengaged if they perceive that the organisation is unprepared for disruptions.
- Financial impact – Extended downtime can result in substantial financial losses, including lost revenue, increased recovery costs, and potential legal liabilities. Planning for shorter recovery times minimises these financial risks.
- Stakeholder confidence – Stakeholders, including investors and partners, may lose confidence in an organisation that plans for prolonged outages. Demonstrating a commitment to timely recovery can enhance stakeholder trust.
- Inefficient resource utilisation – Planning for a prolonged MAO may require overinvestment in resources and infrastructure, which can lead to inefficient resource utilisation. This inefficiency can strain the organisation’s budget and hinder its competitiveness.
Shorten the period of maximum acceptable outage
Planning for a shortened period of Maximum Acceptable Outage (MAO) can also be unwise, as it carries its own set of risks and challenges. While it’s essential to ensure rapid recovery and minimise downtime, overly aggressive planning for a very short MAO can lead to the following issues:
- Resource over-extension – Planning for an extremely short MAO may require substantial resource allocation to achieve rapid recovery. This can result in overextension of resources, including personnel, technology, and infrastructure. Overallocation can strain the organisation’s budget and potentially lead to inefficiencies.
- Increased costs – Achieving a very short MAO often involves significant investments in redundant systems, backup facilities, and rapid response teams. These investments can drive up costs, especially if they are not commensurate with the actual risks and consequences of downtime.
- Complexity – Implementing strategies for an ultra-short MAO can introduce complexity into an organisation’s operations. Complexity can lead to challenges in managing and maintaining the recovery solutions, increasing the likelihood of errors and complications during the recovery process.
- Diminished flexibility – Overly aggressive planning can limit an organisation’s ability to adapt to evolving situations. In some cases, a shorter MAO may not be feasible due to unforeseen circumstances, such as widespread natural disasters or supply chain disruptions.
- Risk of failure – The pursuit of an extremely short MAO can put immense pressure on recovery teams and systems. This pressure can lead to rushed decisions and increased risk of failure during recovery efforts.
- Operational disruption – Focusing solely on minimising downtime can divert attention away from other essential aspects of business continuity planning, such as risk assessment and mitigation. This narrow focus can leave an organisation vulnerable to other types of disruptions.
- Neglected preventive measures – Emphasising rapid recovery may lead organisations to neglect preventive measures that could reduce the likelihood of disruptions in the first place. Prevention is often more cost-effective and less disruptive than recovery efforts.
- Employee burnout – Insisting on an extremely short MAO without adequate resources or support can lead to employee burnout. Team members may be under constant stress to meet unrealistic recovery timeframes.
- Regulatory compliance – In some industries, regulatory requirements, and service level agreements (SLAs) dictate specific recovery timeframes. Planning for a significantly shorter MAO than required may result in non-compliance, leading to legal and financial consequences.
- Competitive disadvantage – While rapid recovery is essential, planning for an unreasonably short MAO can make an organisation less competitive if competitors can provide similar services with more cost-effective recovery strategies.
Planning for a shortened period of Maximum Acceptable Outage must strike a balance between achieving rapid recovery and considering the practicality, cost-effectiveness, and sustainability of the chosen recovery strategies.
The goal should be to align recovery timeframes with the criticality of business functions and the organisation’s overall risk tolerance while avoiding unnecessary resource overextension and complexity.
Suggested two business days of acceptable downtime
A two business days of acceptable downtime would be ideal for many service-based organisations.
The two-day cut-off is based on a worst-case scenario of a system uptime of only 99.5%. That translates to a potential outage or downtime of up to 1d 19h in total annually – rounded up to two business days for business continuity planning purposes from a maximum acceptable outage (MAO) perspective.
When there are external dependencies on IT service providers, we need to be realistic about what and how much can be done by organisations within the MAO period and the resources to do so. This is a balancing act.
Predefined capacity during a disruption
The term “predefined capacity during a disruption” in the context of ISO 22301’s definition of business continuity refers to an organisation’s ability to maintain and deliver its products and services at a specified level or capacity even when faced with a disruptive event. It’s the capacity an organisation aims to sustain, even during challenging circumstances.
Predefining the organisation’s capacity during a disruption is essential for risk management, resource allocation, customer satisfaction, and maintaining operational resilience. It provides a clear framework for business continuity planning and helps organisations navigate disruptions with greater confidence and effectiveness.
Focus on time-sensitive, mission-critical business functions
Focusing on only a handful of time-sensitive, mission-critical business functions (known also as “critical business functions”) and their associated critical IT resources that are aligned with the achievement of the organisation’s strategic objectives is essential within the context of ISO 22301 because it allows organisations to prioritise their efforts, allocate resources effectively, and ensure the continuity of functions that are vital for survival, compliance, and maintaining stakeholder trust. This approach enhances an organisation’s overall resilience and ability to thrive in the face of disruptions.
Assemble a cross-functional team to assess each business function’s role and dependencies; conduct a business impact analysis (BIA) to evaluate financial, operational, and reputational impacts; rate functions based on their criticality; identify essential IT resources; consider regulatory requirements, the customer expectations, and risk factors; and simulate disruption scenarios.
Real-life examples of time-sensitive, mission-critical business functions and their associated critical IT resources in various industries include:
- Financial institutions
- Critical business function: Transaction processing.
- Critical IT resources: Core banking systems, payment processing software, and real-time transaction databases.
- Healthcare providers
- Critical business function: Patient care.
- Critical IT resources: Electronic Health Records (EHR) systems, patient monitoring devices, and telemedicine platforms.
- E-commerce companies
- Critical business function: Order fulfilment.
- Critical IT resources: Inventory management systems, order processing software, and online payment gateways.
- Manufacturing companies
- Critical business function: Production and supply chain management.
- Critical IT resources: Manufacturing execution systems (MES), supply chain optimisation software, and demand forecasting tools.
- Airline industry
- Critical business function: Flight operations.
- Critical IT resources: Air traffic control systems, flight planning software, and passenger booking systems.
- Emergency services
- Critical business function: Emergency response.
- Critical IT resources: Computer-aided dispatch (CAD) systems, first responder communication networks, and GPS tracking for vehicles.
- Telecommunications providers
- Critical business function: Network operations.
- Critical IT resources: Network monitoring tools, routers and switches, redundancy and failover systems.
- Energy utilities
- Critical business function: Power generation and distribution.
- Critical IT resources: Supervisory Control and Data Acquisition (SCADA) systems, grid management software, and outage management systems.
- Retailers
- Critical business function: Point of Sale (POS) operations.
- Critical IT resources: POS terminals, payment processing networks, and inventory tracking systems.
- Government agencies
- Critical business function: Policy formulation and implementation, and grant management.
- Critical IT resources: Taxation software, secure financial databases, and online taxpayer portals.
Focus on Critical IT Resources that enable the Critical Business Functions
Focusing on critical IT resources that drive and enable critical business functions is paramount for any organisation’s resilience. These critical IT resources serve as the backbone, seamlessly integrating technology with operations.
Whether it’s the core banking systems in financial institutions, electronic health records in healthcare, or order processing software in e-commerce, these resources are the linchpin for delivering products and services within acceptable timeframes.
By prioritising and strengthening their availability and reliability, organisations can ensure the uninterrupted flow of critical business processes, safeguarding their ability to navigate disruptions effectively and continue serving their customers and stakeholders without compromise. Recognising and strengthening these critical IT resources is not just a best practice but a strategic imperative for sustained organisational success.
It’s highly likely that these critical IT resources will also serve as a foundation for other non-critical business functions. Take, for instance, the case of a policy and funding government agency, where Microsoft 365 plays a critical role. By reinforcing and strengthening this critical IT infrastructure through the utilisation of Microsoft 365’s cloud SaaS capabilities, the organisation can substantially bolster its overall organisational-wide resilience, fostering a more robust and adaptable operational environment.
Risk appetite and risk tolerance
Understand the terms “within acceptable timeframes” and “predefined capacity during a disruption” in the context of risk appetite and risk tolerance. This involves considering how an organisation sets its expectations and boundaries for business continuity during adverse events.
Risk appetite and risk tolerance are key components of an organisation’s overall risk management framework. They define how much risk an organisation is willing to accept to achieve its objectives.
“Within acceptable timeframes” and “predefined capacity during a disruption” should align with an organisation’s risk appetite and tolerance to ensure that risk management efforts are consistent with its strategic goals.
By defining what is considered “acceptable” in terms of timeframes and capacity, organisations strike a balance between risk and resilience. They can assess how much disruption is tolerable and how much capacity should be maintained during adverse events. This balance is critical for making informed decisions about preventive measures, response strategies, and resource allocation.
It helps organisations design plans that align with their risk preferences. For example, organisations with a low-risk tolerance may plan for shorter downtime and higher capacity during disruptions, while those with a higher risk tolerance may accept longer downtime and lower capacity.
Risk appetite
Risk appetite is the level of risk that an organisation is willing to accept to achieve its objectives. It represents the organisation’s willingness to take risks in pursuit of its goals or strategic objectives.
In the context of business continuity, risk appetite relates to how much disruption an organisation is willing to tolerate while still maintaining its predefined capacity.
For example, a risk-averse organisation may have a low-risk appetite and aim to maintain almost full capacity even during a disruption, whereas a more risk-tolerant organisation may accept a lower capacity during disruptions if it means taking on fewer preventive costs.
Risk tolerance
Risk tolerance is the specific amount or extent of risk that an organisation is willing to endure or tolerate within its risk appetite. It is often expressed quantitatively or qualitatively. It helps define the acceptable boundaries for risk.
In the context of business continuity, risk tolerance relates to how much deviation from predefined capacity during a disruption is acceptable.
For instance, an organisation may set a risk tolerance that allows for a temporary reduction in capacity by up to 20% during a disruption while still considering it within acceptable limits.
The interplay between capacity, risk appetite, and risk tolerance
These factors guide the organisation’s resilience strategy. It’s imperative to understand these terms to make informed decisions regarding prevention, response, and recovery measures.
The interplay between “predefined capacity during a disruption,” risk appetite, and risk tolerance can be illustrated as follows:
- Scenario 1 (low-risk appetite, low-risk tolerance) – An organisation with a low-risk appetite may aim to maintain close to full capacity during disruptions and have a low-risk tolerance for any deviation from this predefined capacity. They will invest heavily in preventive measures to minimise disruptions. For example, Service Provider A may specify that it can handle 99% of support requests within one hour under normal conditions. During a disruption, the aim is to maintain a similar level of support, albeit possibly with a slightly longer response time.
- Scenario 2 (moderate risk appetite, moderate risk tolerance) – An organisation with a moderate risk appetite may accept some reduction in capacity during disruptions if it doesn’t exceed a predefined threshold (i.e., risk tolerance). They strike a balance between preventive measures and cost-effectiveness.
- Scenario 3 (high-risk appetite, high-risk tolerance) – An organisation with a high-risk appetite may accept a significant reduction in capacity during disruptions, viewing it as an opportunity for cost savings. They have a high-risk tolerance and may prioritise recovery and adaptation over prevention. For example, Service Provider B may specify that during a disruption, it only aims to maintain 40% of support requests (as compared with IT Service Provider A of 99%).
In essence, the organisation’s approach to “predefined capacity during a disruption” reflects its risk appetite and risk tolerance. This also applies to the determination of “acceptable timeframes” or acceptable downtimes.
Defining and understanding these terms in the context of business continuity planning helps organisations make informed decisions about the level of resilience they wish to achieve and the trade-offs they are willing to make between prevention, response, and recovery measures.
These trade-offs will require a significant investment of resources.
Plan for the outcome of negative impacts
ISO 22301 defines a business continuity plan as “documented information that guides an organisation to respond to a disruption and resume, recover and restore the delivery of products and services consistent with its business continuity objectives” (emphasis added).
Business continuity plans provide guidance to employees on response and recovery strategies and procedures following a disruptive event, to support the continuity of essential services until their restoration to business as usual (the new normal).
Disruptions, whether anticipated or not, can generally impact:
- Site or asset availability.
- Key IT system functionality.
- Workforce availability.
- Third-party dependencies.
The loss or unavailability of these critical resources will in turn cause the downstream loss or unavailability of services that are delivered to your stakeholders.
Taking an all-hazards approach to your planning
Writing a business continuity plan based on an all-hazards approach is crucial for several reasons, especially considering that disruptions, whether anticipated or unanticipated, can significantly impact an organisation’s operations.
An all-hazards approach to business continuity planning is important for the following reasons:
- Comprehensive preparedness – An all-hazards approach ensures that your organisation is prepared for a wide range of potential disruptions, regardless of their origin or nature. By considering various types of hazards, you increase the chances of being ready for both common and rare events.
- Risk mitigation – Disruptions can be unpredictable, and some may not fit neatly into predefined categories. An all-hazards approach allows you to identify and mitigate risks that might not have been anticipated initially. This proactive approach can help prevent or minimise the impact of unanticipated disruptions.
- Resource efficiency – Developing separate plans for each type of disruption can be resource-intensive and cumbersome. An all-hazards approach streamlines and simplifies your business continuity planning process as many preparedness and response measures are applicable across different types of events. This efficiency can save time and resources.
- Flexibility and adaptability – Disasters and disruptions often bring unforeseen challenges. An all-hazards approach encourages flexibility and adaptability in your response. Instead of adhering to rigid plans for specific scenarios, you can adjust strategies based on the unique characteristics of the disruption.
- Cost-effective planning – An all-hazards approach helps organisations avoid over-investing in preparedness for one specific type of disruption while neglecting others. It encourages a balanced allocation of limited resources to address a spectrum of potential risks.
- Improved communication – A single, all-encompassing organisational-wide business continuity plan promotes better communication and coordination within the whole organisation. When everyone follows the same framework, it’s easier to ensure that information flows smoothly and that teams can work together effectively during a crisis.
- Risk awareness and culture – Fostering a culture of risk awareness is essential. An all-hazards approach reinforces the idea that disruptions can come from various sources, making everyone within the organisation more vigilant and proactive in identifying potential threats.
- Regulatory compliance – Many regulatory bodies and industry standards, such as ISO 22301, encourage or require organisations to adopt an all-hazards approach in their business continuity planning. Compliance can be easier to achieve when your plan aligns with this approach.
- Simplification of training and testing – Training and testing become more straightforward when you have a single, versatile organisational-wide business continuity plan that applies to various scenarios from an all-hazards approach. This can lead to more effective and simplified training programs and exercises.
- Better stakeholder confidence – Customers, partners, and stakeholders often feel more confident in organisations that have a robust all-hazards organisational-wide business continuity plan. Demonstrating your commitment to resilience and preparedness can enhance trust and reputation.
Given the unpredictable nature of disruptions, this all-hazards approach to business continuity planning helps ensure that your organisation can continue delivering products and services consistently, regardless of the type of disruption it faces.
Summary
At its heart, business continuity is a testament to an organisation’s resilience, and it’s defined succinctly by ISO 22301 as the “capability of an organisation to continue the delivery of products and services within acceptable timeframes at predefined capacity during a disruption.“
This article embarks on a journey to demystify this critical concept, breaking down the intricacies of disruptions, anticipated and unanticipated while highlighting the paramount importance of comprehensive business continuity planning.
We delve into the nuances of maximum downtime and the interplay between risk appetite, risk tolerance, and predefined capacity during a disruption.
Moreover, we emphasise the necessity of adopting an all-hazards approach to business continuity planning, offering a holistic perspective to address the myriad challenges disruptions can pose.
In a world where uncertainty looms large, understanding and implementing effective but simplified business continuity strategies can be the difference between survival and stagnation.