3 of 40

Teaching

DevOps

Programmatically provision images.
Automatically apply configuration management to production environments.
Automatically create and maintain build environments.
Maintain test suites and measure testing quality and coverage.
Automatically generate new tests, using feedback-directed random testing, fuzzing, and data-flow analysis.
Programmatically measure code quality via static and dynamic code analysis.
Understand components of infrastructure.
Remotely regulate behavior of deployed software via feature flags and configuration servers.
Apply advanced strategies for deployment of software.
Monitor and analyze telemetry data.
Implement resilience testing on production environments (e.g., Chaos Monkey).

Course fills up in seconds… its a competitive sport to sign up...

100’s of emails asking to be enrolled.

Example HW1: Automatically provision and configure a Jenkins server using Ansible.

https://github.com/CSC-DevOps/Course

4 of 40

Continuous Deployment Summit (I-V)

5 of 40

Guessing Game

How many commits are deployed daily, at Netflix?

How often is the software of Disney theme park rides updated?

6 of 40

Classic IT

Send resource request.

2) 3-6 months later.

3) Receive Resource.

�IT: “Here is your server with php 5 installed”

You: “But I requested it with php 6!”

IT: … ¯\_(ツ)_/¯

7 of 40

DevOps

A short history of devops

http://itrevolution.com/the-history-of-devops/

8 of 40

Two sides

Operation-centric:

Manage inventory of servers automatically

Provisioned, configured automatically

Monitoring, analysis of operations

Developer centric:

Continuous deployment
Push code to production through pipeline

9 of 40

Class philosophy

Understand how it works
Automate all the things
You’re still not done, yet.

10 of 40

Automate All The Things

11 of 40

You’re still not done, yet*

You can get a clean machine and do it again.
Until, it has been automatically inspected.
Until, it has been fuzz tested.
It is stable for several days…
Until, it is pushed and runs in production.
Until, it handles load testing.
Until you can patch it
…

12 of 40

Skills

Configuration management:

Learn proper source control and configuration practices.

Testing and analysis

Learn how to automate testing and inspection of code

Provision and configuration

Learn how to acquire resources automatically and set them up, again, and again.

Infrastructure

Learn basics of web-scale application design.

Deployment

Learn how to deploy software in live environments.

Monitoring, analysis, and experimentation

Learn how to watch and coordinate unstable changes.

Pipeline design

Learn how to build an entire continuous deployment pipeline from scratch.

13 of 40

History of Continuous Deployment

14 of 40

Nightly Build

Build code and run smoke test (Microsoft 1995)

Benefits

It minimizes integration risk.
It reduces the risk of low quality
It supports easier defect diagnosis
It improves morale

15 of 40

Continuous Integration

A practice where developers automatically build, test, and analyze a software change in response to every software change committed to the source repository.

16 of 40

Continuous Delivery

A practice that ensures that a software change can be delivered and ready for use by a customer by testing in production-like environments.

17 of 40

Continuous Deployment

A practice where incremental software changes are automatically tested, vetted, and deployed to production environments.

18 of 40

Continuous * (Perpetual Development)

19 of 40

Example Deployment Pipeline

20 of 40

Exercise

Explain the difference between �a) continuous integration

b) continuous delivery

c) continuous deployment

To a partner next to you.

21 of 40

Lessons in Continuous Deployment

22 of 40

Nimble Giants

Technicians manually updated thousands of servers in the park => Weekly deploy over red switch network

SAS: Month long upgrade, to ansible deployments and yum packages....

24 of 40

2) Fast to Deploy, Slow to Release

Chunk Rossi at Facebook: “Get your shit in, fix it in production”

25 of 40

Dark Launches at Instagram

Early: Integrate as soon as possible. Find bugs early. Code can run in production about 6 months before being publically announced.
Often: Reduce friction. Try things out. See what works. Push small changes just to gather metrics, feasibility testing. Large changes just slow down the team. Do dark launches, to see what performance is in production, can scale up and down. "Shadow infrastructure" is too expensive, just do in production.
Incremental: Deploy in increments. Contain risk. Pinpoint issues.

26 of 40

Facebook process

Release is cut Sunday 6pm

Stabilize until Tuesday, canaries, release. Tuesday push is 12,000 diffs.

Cherry pick: Push 3 times a day (Wed-Fri) 300-700 cherry picks / day.

27 of 40

Rapid Release/Mozilla

If deployment requires on-prem deployment, say a web browser

There are three channels: Alpha, Beta, Release Candidate

Code flows every 2 weeks to next channel, unless fast tracked by release engineer.

Involve corporate customer specific testing in testing (Practice also used by IBM, Redhat)

28 of 40

Ring Deployment: Microsoft

For products like VSTS or Exchange, need slower deployment model.

Commits flow out to rings, deflight if issue.

Example applying model to LexisNexis*:

Ring 0 => LexisNexis Legal Department (2 people)�Ring 1 => UNC School of Law (Free broken software for students)

Ring 2 => Beta Practices

Ring 3 => Many

Ring 4 => All

“*”: Not currently implemented at LexisNexis

29 of 40

3) Every Feature is an Experiment

30 of 40

Experimentation

50 shades of blue: What color of ads links makes more money?

Continuous Experimentation at Bing:

https://www.ifi.uzh.ch/dam/jcr:bafebc0f-ac0c-46d9-934b-4a0d5e2aab14/Characterizing_Experimentation_SEIP2017.pdf

Twitter:

Tool tracks experiments being run

31 of 40

Controlling feature flags

32 of 40

Netflix

60,000 configuration changes a day. 4000 commits a day.

Every commit creates an Amazon Machine Imagine (AMI).

AMI is automated deployed to a new RED/BLACK cluster.

Have automated canary analysis, if okay, switch to new version, if not, rollback commit.

33 of 40

4) Shame and other lessons in culture

34 of 40

You are the Support Person

There are no manual testers at most of these companies

Quality mind-set: Your job, not someone else’s.

Facebook: You own the change from cradle to grave.
Netflix: You don’t want late night page going to CEO.
Alternative: Site reliability engineers.

35 of 40

No silos

Fully integrated teams.
Even for specialized roles, like security, at Slack:

Walk through security issues with engineers in story planning
Consult early, not when problems

36 of 40

Invest in Tooling

500 Engineers at Facebook working on tooling.
Google has dedicated team to developer tools and productivity
Most companies develop their own devops infrastructure.

37 of 40

Blameless culture

Retrospectives

38 of 40

Operations Responsibility

39 of 40