Published using Google Docs
The DevOps Risk and Controls Matrix (RCM)
Updated automatically every 5 minutes

We Will Manage This Risk

By Performing These Activities

Which Fulfill These Controls

Unauthorized changes to production

Making sure the right people are making the changes through:

  • Multi-factor authentication
  • Role-based access control
  • Managing all credentials, tokens, connection strings, endpoints, and other secrets in an encrypted vault and rotating them on a period basis or upon relevant business events (such as employee separation)

Assuring that changes can’t be made manually by ensuring:

  • No human access to production except by time-limited tokens granted under access approval rules (“just-in-time admin”)
  • All change events are logged and monitored
  • Production changes are made only via secure pipelines (inputs to the pipeline are known and  reviewed, and changes to the pipeline steps are reviewed and approved)

Data is protected and isolated through:

  • Data encryption in rest and in transit
  • Separation of networks and domains

Our development practices are representative of the responsible work of our craft; for example:

  • All sources (infra, app, tests, policies, and pipeline) are version-controlled under permissions
  • All changes to sources are peer reviewed
  • Critical business transactions are tested in production
  • Incident response processes have service-level expectations

Identity management, centralized access management, encryption, secrets management, separation of domains, secure pipelines

Production breaks due to human error or untested/insecure code

  • All sources (infra, app, tests, policies, and pipeline) are version-controlled under permissions
  • All changes to sources are peer reviewed
  • Deployment authorization
  • Automated software composition analysis
  • Automated static code analysis
  • Automated dynamic analysis
  • Automated security Business Driven Development with evil user stories
  • Automated “Chaos” testing (like Netflix Chaos Monkey, etc.)
  • Product team fully accountable for quality of service in production

Test traceability, test results (including security tests and scans)

Material misstatement of financial data

  • Segregate financially relevant systems and services
  • Authorized code review (who, what, where)
  • Rotation of job responsibility
  • Code ownership at a team level
  • Anomaly detection
  • “Just-in-time admin”

Least privilege access

code review, four eyes on code and deployment

Intellectual property and licensing violation (open source/commercial)

  • Software composition analysis
  • Approved software inventory
  • Bill of materials on every build

Verification of authorized software

Data breach from unauthorized access

  • Full definition (PII) tokenization
  • Encryption at rest and in transit
  • Data retention policy
  • Ethical hacking, “red teaming” to identify vulnerabilities on a regular cadence

Compromise from insider threat

Unwanted customer impact (blast radius) from changes

  • Canary deployment
  • Exposure control through progressive blue/green deployment
  • Features flags for dark launches and experimentation
  • In absence of exposure control, automated rollback process
  • A/B testing

Spread of exposure to vulnerability and attack

Business continuity

  • Continuous data replication off site
  • Secondary hot site
  • RTO/RPO acceptance from business
  • Periodic disaster recovery exercise

Timely backup and recovery

Divergence of audit evidence from developer evidence

  • Automated evidence and log collection across toolchain with traceability and tagging for extraction
  • Reproducibility of the version of product state

Valid source documents with completeness and accuracy

Violation of GDPR (General Data Protection Regulation of the European Union) or leak/misuse/retention of PII against rules

  • Hosting data in appropriate jurisdiction
  • Allowing EUII deletion
  • Plain-language terms and conditions

Data residency, right to forget, customer awareness of T&C

Hidden compromise or unknown breach of infrastructure

  • Ethical hacking, “red teaming” to identify vulnerabilities on a regular cadence
  • Monitoring data egress
  • Attack detection
  • Instrumentation to capture unusual activities

Appropriate management of cyber-risk