1 of 54

Business Continuity and Disaster Recovery Planning

By: Dr. Mohammad Shoab

2 of 54

Business Continuity and Disaster Planning Basics

3 of 54

What Is a Disaster

  • Any natural or man-made event that disrupts the operations of a business �in such a significant way that a considerable and coordinated effort is required to achieve a recovery.

3

4 of 54

Natural Disasters

  • Geological: earthquakes, volcanoes, tsunamis, landslides, and sinkholes
  • Meteorological: hurricanes, tornados, wind storms, hail, ice storms, snow storms, rainstorms, and lightning
  • Other: avalanches, fires, floods, meteors and meteorites, and solar storms
  • Health: widespread illnesses, quarantines, and pandemics

4

5 of 54

Man-made Disasters

  • Labor: strikes, walkouts, and slow-downs that disrupt services and supplies
  • Social-political: war, terrorism, sabotage, vandalism, civil unrest, protests, demonstrations, cyber attacks, and blockades

5

6 of 54

Man-made Disasters (cont.)

  • Materials: fires, hazardous materials spills
  • Utilities: power failures, communications outages, water supply shortages, fuel shortages, and radioactive fallout from power plant accidents

6

7 of 54

How Disasters Affect Businesses

  • Direct damage to facilities and equipment
  • Transportation infrastructure damage
    • Delays deliveries, supplies, customers, employees going to work
  • Communications outages
  • Utilities outages

7

8 of 54

How BCP and DRP�Support Security

  • BCP (Business Continuity Planning) and DRP (Disaster Recovery Planning)
  • Security pillars: C-I-A
    • Confidentiality
    • Integrity
    • Availability
  • BCP and DRP directly support availability

8

9 of 54

BCP and DRP Differences �and Similarities

  • BCP
    • Activities required to ensure the continuation of critical business processes in an organization
    • Alternate personnel, equipment, and facilities
    • Often includes non-IT aspects of business
  • DRP
    • Assessment, salvage, repair, and eventual restoration of damaged facilities and systems
    • Often focuses on IT systems

9

10 of 54

Industry Standards Supporting �BCP and DRP

  • ISO 27001: Requirements for Information Security Management Systems. Section 14 addresses business continuity management.
  • ISO 27002: Code of Practice for Business Continuity Management.

10

11 of 54

Industry Standards Supporting �BCP and DRP (cont.)

  • NIST 800-34
    • Contingency Planning Guide for Information Technology Systems.
    • Seven step process for BCP and DRP projects
    • From U.S. National Institute for Standards and Technology
  • NFPA 1600
    • Standard on Disaster / Emergency Management and Business Continuity Programs
    • From U.S. National Fire Protection Association

11

12 of 54

Industry Standards Supporting �BCP and DRP (cont.)

  • NFPA 1620: The Recommended Practice for Pre-Incident Planning.
  • HIPAA: Requires a documented and tested disaster recovery plan
    • U.S. Health Insurance Portability and Accountability Act

12

13 of 54

Benefits of BCP and DRP Planning

  • Reduced risk
  • Process improvements
  • Improved organizational maturity
  • Improved availability and reliability
  • Marketplace advantage

13

14 of 54

The Role of Prevention

  • Not prevention of the disaster itself
    • Prevention of surprise and disorganized response
  • Reduction in impact of a disaster
    • Better equipment bracing
    • Better fire detection and suppression
    • Contingency plans that provide [near] continuous operation of critical business processes
    • Prevention of extended periods of downtime

14

15 of 54

Running a BCP / DRP Project

16 of 54

Running a BCP / DRP Project

  • Main phases
    • Pre-project activities
    • Perform a Business Impact Assessment (BIA)
    • Develop business continuity and recovery plans
    • Test resumption and recovery plans

16

17 of 54

Pre-project Activities

  • Obtain executive support
  • Formally define the scope of the project
  • Choose project team members
  • Develop a project plan
    • Get a project manager
  • Develop a project charter
    • A document listing all these items, plus budget, and milestones

17

18 of 54

Business Impact Assessment (BIA)

19 of 54

Performing a Business �Impact Assessment

  • Survey critical processes
  • Perform risk analyses and threat assessment
  • Determine Maximum Tolerable Downtime (MTD)
  • Establish key recovery targets

19

20 of 54

Survey In-scope �Business Processes

  • Develop interview / intake template
  • Interview a rep from each department
    • Identify all important processes
    • Identify dependencies on systems, people, equipment
  • Collate data into database or spreadsheets
    • Gives a big picture, all-company view

20

21 of 54

Threat and Risk Analysis

  • Identify threats, vulnerabilities, risks, for each key process
    • Rank according to probability, impact, cost
    • Identify mitigating controls

21

22 of 54

Determine Maximum �Tolerable Downtime (MTD)

  • For each business process
  • Identify the maximum time that each business process can be inoperative before significant damage or long-term viability is threatened
  • Probably an educated guess for many processes

22

23 of 54

Determine Maximum �Tolerable Downtime (cont.)

  • Obtain senior management input to validate data
  • Publish into the same database / spreadsheet listing all business processes

23

24 of 54

Develop Statements of Impact

  • For each process, describe the impact �on the rest of the organization if the process is incapacitated
  • Examples
    • Inability to process payments
    • Inability to produce invoices
    • Inability to access customer data for support purposes

24

25 of 54

Record Other Key Metrics

  • Examples
    • Cost to operate the process
    • Cost of process downtime
    • Profit derived from the process
  • Useful for upcoming Criticality Analysis

25

26 of 54

Ascertain Current Continuity and Recovery Capabilities

  • For each business process
    • Identify documented continuity capabilities
    • Identify documented recovery capabilities
    • Identify undocumented capabilities
      • What if the disaster happened tomorrow

26

27 of 54

Develop Key Recovery Targets

  • Recovery time objective (RTO)
    • Period of time from disaster onset to �resumption of business process
  • Recovery point objective (RPO)
    • Maximum period of data loss from onset �of disaster counting backwards
    • Amount of work that will have to be done over

27

28 of 54

Develop Key Recovery �Targets (cont.)

  • Obtain senior management buyoff on RTO and RPO
  • Publish into the same database / spreadsheet listing all business processes

28

29 of 54

Sample Recovery Time Objectives

RPO

Technology(ies) required

8-14 days

New equipment, data recovery from backup

4-7 days

Cold systems, data recovery from backup

2-3 days

Warm systems, data recovery from backup

12-24 hours

Warm systems, recovery from high speed backup media

29

30 of 54

Sample Recovery �Time Objectives (cont.)

RPO

Technology(ies) required

6-12 hours

Hot systems, recovery from high speed backup media

3-6 hours

Hot systems, data replication

1-3 hours

Clustering, data replication

< 1 hour

Clustering, near real time data replication

30

31 of 54

Criticality Analysis

  • Rank processes by criticality criteria
    • MTD (maximum tolerable downtime)
    • RTO (recovery time objective)
    • RPO (recovery point objective)
    • Cost of downtime or other metrics
    • Qualitative criteria
      • Reputation, market share, goodwill

31

32 of 54

Improve System and �Process Resilience

  • For the most critical processes (based upon ranking in the criticality analysis)
    • Identify the biggest risks
    • Identify cost of mitigation
    • Can several mitigating controls be combined
    • Do mitigating controls follow best / common practices

32

33 of 54

Develop Business Continuity and Recovery Plans

34 of 54

Select Recovery Team Members

  • Selection criteria
    • Location of residence, relative to work �and other key locations
    • Skills and experience (determines effectiveness)
    • Ability and willingness to respond
    • Health and family (determines probability to serve)
    • Identify backups
      • Other team members, external resources

34

35 of 54

Emergency Response

  • Personnel safety: includes first-aid, searching for personnel, etc.
  • Evacuation: evacuation procedures to prevent any hazard to workers.
  • Asset protection: includes buildings, vehicles, and equipment.

35

36 of 54

Emergency Response (cont.)

  • Damage assessment: this could involve �outside structural engineers to assess damage to buildings and equipment.
  • Emergency notification: response team communication, and keeping management and organization staff informed.

36

37 of 54

Damage Assessment and Salvage

  • Determine damage to buildings, equipment, utilities
    • Requires inside experts
    • Usually requires outside experts
      • Civil engineers to inspect buildings
      • Government building inspectors
  • Salvage
    • Identify working and salvageable assets
    • Cannibalize for parts or other uses

37

38 of 54

Notification

  • Many parties need to know the condition of the organization
    • Employees, suppliers, customers, regulators, authorities, shareholders, community
  • Methods of communication
    • Telephone call trees, web site, signage, media
    • Alternate means of communication must be identified

38

39 of 54

Personnel Safety

  • The number one concern in any disaster response operation
    • Emergency evacuation
    • Accounting for all personnel
    • Administering first-aid
    • Emergency supplies
      • Water, food, blankets, shelters
      • On-site employees could be stranded for �several days

39

40 of 54

Communications

  • Communications essential during emergency operations
  • Considerations
    • Avoid common infrastructure
      • Don't have emergency communications through the same wires as normal communications
    • Diversify mobile services
    • Consider two-way radios
    • Consider satellite phones
    • Consider amateur radio

40

41 of 54

Public Utilities and Infrastructure

  • Often interrupted during a disaster
    • Electricity: UPS (Uninterruptible Power Supply), generator
    • Water: building could be closed if no �water is available for fire suppression
    • Natural gas: heating
    • Wastewater: if disabled, building could be closed
    • Steam heat

41

42 of 54

Logistics and Supplies

  • Food and drinking water
  • Blankets and sleeping cots
  • Sanitation (toilets, showers, etc.)
  • Tools
  • Spare parts
  • Waste bins
  • Information
  • Communications
  • Fire protection (extinguishers, sprinklers, smoke alarms, fire alarms)

42

43 of 54

Business Resumption Planning

  • Alternate work locations
  • Alternate personnel
  • Communications
    • Emergency, support of business processes
  • Standby assets and equipment
  • Access to procedures, business records

43

44 of 54

Restoration and Recovery

  • Repairs to facilities, equipment
  • Replacement equipment
  • Restoration of utilities
  • Resumption of business operations in primary business facilities

44

45 of 54

Improving System Resilience �and Recovery

  • Off-site media storage
    • Assurance of data recovery
  • Server clusters
    • Improved availability
    • Geographic clusters: members far apart
  • Data replication
    • Application, DMBS, OS, or Hardware
    • Maintains current data on multiple servers even in remote places

45

46 of 54

Training Staff

  • Everyday operations
  • Recovery procedures
  • Emergency procedures
  • Resumption procedures

46

47 of 54

Testing Business Continuity �and Disaster Recovery Plans

48 of 54

Testing Business Continuity �and Disaster Recovery Plans

  • Five levels of testing
    • Document review
    • Walkthrough
    • Simulation
    • Parallel test
    • Cutover test

48

49 of 54

Document Review

  • Review of recovery, operations, resumption plans and procedures
  • Performed by individuals
  • Provide feedback to document owners

49

50 of 54

Walkthrough

  • Performed by teams
  • Group discussion of recovery, operations, resumption plans and procedures
  • Brainstorming and discussion brings out new issues, ideas
  • Provide feedback to document owners

50

51 of 54

Simulation

  • Walkthrough of recovery, operations, resumption plans and procedures in a scripted “case study” or “scenario”
  • Performed by teams
  • Places participants in a mental disaster setting that helps them discern real issues more easily

51

52 of 54

Parallel Test

  • Full or partial workload is applied to recovery systems
  • Performed by teams
  • Tests actual system readiness and accuracy of procedures
  • Production systems continue to operate and support actual business processes

52

53 of 54

Cutover Test

  • Production systems are shut down or disconnected; recovery systems assume full actual workload
  • Risk of interrupting real business
  • Gives confidence in DR (Disaster Recovery) system if it works

53

54 of 54

Thank You

54