Principles of Software Construction: Objects, Design, and Concurrency��DevOps
�Jeremy Lacomis Christian Kaestner
1
17-214/514
Almost there…
Subtype Polymorphism ✓
Information Hiding, Contracts ✓
Immutability ✓
Types ✓ �Static Analysis ✓
Unit Testing ✓
Domain Analysis ✓
Inheritance & Del. ✓
Responsibility�Assignment,�Design Patterns,�Antipattern ✓
Promises/�Reactive P. ✓
Static Analysis ✓
GUI vs Core ✓
Frameworks and Libraries ✓, APIs ✓
Distributed systems,�microservices ✓
Testing for Robustness ✓
CI ✓, DevOps
Design for
understanding
change/ext.
reuse
robustness
...
Small scale:
One/few objects
Mid scale:
Many objects
Large scale:
Subsystems
2
17-214/514
Testing in Production
3
17-214/514
Which design is better for signups?
4
17-214/514
Which design is better for sales?
5
17-214/514
Bing’s $100M/year experiment
6
17-214/514
Bing’s $100M/year experiment
7
17-214/514
How often to release a new version?
8
17-214/514
9
17-214/514
10
17-214/514
11
17-214/514
Enabling Frequent Releases & Experimentation��DevOps
12
17-214/514
Programming Reality
13
17-214/514
Today’s Topics
From CI to CD
Containers
Configuration management
Monitoring
Feature flags, testing in production
14
17-214/514
Recall: Continuous Integration
15
17-214/514
16
17-214/514
17
17
17-214/514
18
17-214/514
Continuous Integration
19
17-214/514
Continuous Integration
20
17-214/514
Any repetitive QA work remaining?
21
21
17-214/514
Releasing Software
22
17-214/514
Semantic Versioning for Releases
http://semver.org/
23
17-214/514
Versioning entire projects
24
24
17-214/514
Release management �with branches
25
17-214/514
Release cycle of Facebook’s apps
26
17-214/514
Release Challenges for Mobile Apps
Any alternatives?
27
17-214/514
Release Challenges for Mobile Apps
Server side releases silent and quick, consistent�→ App as container, most content + layout from server
28
17-214/514
From Release Date to Continuous Release
29
17-214/514
From Release Date to Continuous Release
30
17-214/514
Efficiency of release pipeline
https://www.slideshare.net/jmcgarr/continuous-delivery-at-netflix-and-beyond
31
17-214/514
From Release Date to Continuous Release
32
17-214/514
33
17-214/514
CC BY-SA 4.0�G. Détrez
34
17-214/514
The Shifting
Development-Operations Barrier
35
17-214/514
36
17-214/514
Common Release Problems?
37
17-214/514
Common Release Problems (Examples)
38
17-214/514
The Dev – Ops Divide
39
QA responsibilities in both roles
39
17-214/514
QA Does not Stop in Dev
40
17-214/514
QA Does not Stop in Dev
41
17-214/514
DevOps
42
17-214/514
43
17-214/514
Key Ideas and Principles
Better coordinate between developers and operations (collaborative)
Key goal: Reduce friction bringing changes from development into production
Considering the entire tool chain into production (holistic)
Documentation and versioning of all dependencies and configurations ("configuration as code")
Heavy automation, e.g., continuous delivery, monitoring
Small iterations, incremental and continuous releases
Buzz word!
44
17-214/514
Common Practices
All configurations in version control
Test and deploy in containers
Automated testing, testing, testing, ...
Monitoring, orchestration, and automated actions in practice
Microservice architectures
Release frequently
45
17-214/514
Heavy Tooling and Automation
46
17-214/514
Heavy tooling and automation -- Examples
Infrastructure as code — Ansible, Terraform, Puppet, Chef
CI/CD — Jenkins, TeamCity, GitLab, Shippable, Bamboo, Azure DevOps
Test automation — Selenium, Cucumber, Apache JMeter
Containerization — Docker, Rocket, Unik
Orchestration — Kubernetes, Swarm, Mesos
Software deployment — Elastic Beanstalk, Octopus, Vamp
Measurement — Datadog, DynaTrace, Kibana, NewRelic, ServiceNow
47
17-214/514
DevOps: Tooling Overview
48
17-214/514
DevOps Tools
49
17-214/514
Tooling for Building
50
17-214/514
51
17-214/514
52
17-214/514
Configuration management, �Infrastructure as Code
$nameservers = ['10.0.2.3']
file { '/etc/resolv.conf':
ensure => file,
owner => 'root',
group => 'root',
mode => '0644',
content => template('resolver/r.conf'),
}
- hosts: all
sudo: yes
tasks:
- apt: name={{ item }}
with_items:
- ldap-auth-client
- nscd
- shell: auth-client-config -t nss -p lac_ldap
- copy: src=ldap/my_mkhomedir dest=/…
- copy: src=ldap/ldap.conf dest=/etc/ldap.conf
- shell: pam-auth-update --package
- shell: /etc/init.d/nscd restart
(Puppet)
(ansible)
53
17-214/514
Tooling for Execution
Containers drastically simplify managing ops
54
17-214/514
Container Orchestration with Kubernetes
55
17-214/514
CC BY-SA 4.0 Khtan66
56
17-214/514
Tooling for Execution
We’ll talk about Cloud next week
How about monitoring?
57
17-214/514
Monitoring
58
17-214/514
59
17-214/514
Grafana
60
17-214/514
61
17-214/514
Testing in Production
62
17-214/514
Testing in �Production
63
17-214/514
Chaos �Experiments
64
17-214/514
65
17-214/514
Crash Telemetry
66
17-214/514
What If
... we had plenty of subjects for experiments
... we could randomly assign subjects to treatment and control group without them knowing
... we could analyze small individual changes and keep everything else constant
▶ Ideal conditions for controlled experiments
67
17-214/514
Experiment Size
With enough subjects (users), we can run many many experiments
Even very small experiments become feasible
Toward causal inference
68
17-214/514
A/B Testing
69
17-214/514
Implementing A/B Testing
Implement alternative versions of the system
Map users to treatment group
Monitor outcomes per group
70
17-214/514
Feature Flags
Boolean options
Good practices: tracked explicitly, documented, keep them localized and independent
External mapping of flags to customers
if (features.enabled(userId, "one_click_checkout")) {
// new one click checkout function
} else {
// old checkout functionality
}
def isEnabled(user): Boolean = (hash(user.id) % 100) < 10
71
17-214/514
72
17-214/514
Comparing Outcomes
Group A
base game�
2158 Users
average 18:13 min time on site
Group B
game with extra god cards
10 Users
average 20:24 min time on site
73
73
17-214/514
74
74
17-214/514
75
75
17-214/514
76
17-214/514
Canary�Releases
77
17-214/514
Canary Releases
78
17-214/514
Canary Releases
79
17-214/514
Canary Releases at Facebook
Phase 0: Automated unit tests
Phase 1: Release to Facebook employees
Phase 2: Release to subset of production machines
Phase 3: Release to full cluster
Phase 4: Commit to master, rollout everywhere
Monitored metrics: server load, crashes, click-through rate
Further readings: Tang, Chunqiang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. Holistic configuration management at Facebook. In Proceedings of the 25th Symposium on Operating Systems Principles, pp. 328-343. ACM, 2015. and Rossi, Chuck, Elisa Shibley, Shi Su, Kent Beck, Tony Savor, and Michael Stumm. Continuous deployment of mobile software at facebook (showcase). In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 12-23. ACM, 2016.
80
17-214/514
Real DevOps Pipelines are �Complex
81
Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. Holistic Configuration Management at Facebook. Proc. of SOSP: 328--343 (2015).
81
17-214/514
Summary
Increasing automation of tests and deployments
Containers and configuration management tools help with automation, deployment, and rollbacks
Monitoring becomes important
Many new opportunities for testing in production (feature flags are common)
82
17-214/514