Growing Pains
Cory Zue
@czue
This presentation works better in presentation mode (click “Present” above or press Ctrl-F5)
Who am I?
2007-2017
CTO, Dimagi
Today
“Chief Accelerator”
Builder / Entrepreneur
Freelance Consultant
Emoji-Lover
Why this talk?
Growing is hard.
Especially from 1-100 people (or servers).
It’s probably hard past 100 too, but I don’t know much about that.
Three Talks Parts
Scaling Code
Scaling your Stack
Scaling Teams
Scaling Code: Fighting Complexity
A codebase
A codebase
A codebase
Enter: abstraction
System
Glue
Subsystem
Subsystem
Subsystem
A more sustainable codebase
A more sustainable codebase
A more sustainable codebase
A more sustainable codebase
How to abstract?
No structure
“Organized” monolith
Modules define public APIs via __init__.py
Shared modules using git submodules
Shared libraries using PyPI
Service Oriented Architecture
Most Short Term Work
Most Long Term Scalable
Least Short Term Work
Least Long Term Scalable
Rough Guidelines
Execute for the short term (< 1 year)
Plan for the long term (1-3 years)
Design to be changed (always)
Most Short Term Work
Most Long Term Scalable
Least Short Term Work
Least Long Term Scalable
All the things!
A few other quick tips
Clean all the things!
Rule: Always leave code cleaner than when you found it
Flake CI
Conduct refactorathons
Resource tech debt
Review all the things!
Code review is critical
Conduct 10% design reviews for big features
(Automated) Test all the things!
Test while coding
Write failing tests before fixing bugs
Build a CI pipeline
Scaling Your Stack: Fighting Scale
Django
Web Server
Database
Database
Python App Code
Web Server
Database
Database / 3rd Party
Django
Database
Cache
The site is getting slow!
Python App Code
Web Server
Django
Database
Cache
We need background processing!
Background Processor
Celery
(background processor)
Python App Code
Web Server
Database / 3rd Party
Celery
Celery
Django
Django
Django
Database
Cache
The machines can’t keep up with the load!
Background Processor
Celery
(background processor)
Python App Code
Web Server
Database / 3rd Party
Celery
Celery
Django
Django
Django
Database
Cache
The database can’t keep up with the load!
Background Processor
Celery
(background processor)
Database
(Read Replica(s))
Python App Code
Web Server
Database / 3rd Party
Stream Processing
Stream Processing
Celery
Celery
Django
Django
Django
Database
Cache
Our reports are taking way too long!
Background Processor
Celery
(background processor)
Database
(Read Replica(s))
Analytics Database
Stream Processing
Streaming Platform
Python App Code
Web Server
Database / 3rd Party
Stream Processing
Stream Processing
Celery
Celery
Django
Django
Django
Database
Cache
It’s the type of data in the database!
Background Processor
Celery
(background processor)
Database
(Read Replica(s))
Analytics Database
Stream Processing
“Blob” Database
Streaming Platform
Python App Code
Web Server
Database / 3rd Party
Stream Processing
Stream Processing
Celery
Celery
Django
Django
Django
Databases
Cache
The data is just too big!
Background Processor
Celery
(background processor)
Database
(Read Replica(s))
Analytics Database
Stream Processing
“Blob” Database
Database
Streaming Platform
Python App Code
Web Server
Database / 3rd Party
Rough Guidelines
Execute for the short term (< 1 year)
Plan for the long term (1-3 years)
Design to be changed (always)
Most Short Term Work
Most Long Term Scalable
Least Short Term Work
Least Long Term Scalable
All the things!
A few other quick tips
Learn all the things!
Build DevOps capacity
Get to know your tools well. Really well.
Monitor all the things!
Uptime → Pingdom, Status Cake, others
Servers → New Relic, Datadog, Munin, others
Record all the things!
Logs
ELK (ElasticSearch, Logstash, Kibana)
Datadog
Orchestrate all the things!
Ansible (Salt, Fabric, Puppet/Chef)
Scaling Teams: Fighting Chaos
Two Common Types of Organic, Small Team Growth
A single person (or small group of people) owns something.
Common examples:
Superhero Model
Two Common Types of Organic, Small Team Growth
The “herd” (aka everyone on the team) has distributed, self-organizing ownership
Common examples:
Herd Model
How this works over time
Superheroes and Growth
Herds and Growth
Organic → Designed
Organic Systems
Happen unintentionally over a long period of time
Exist because “that’s how it’s always worked”
Can often be hard to explain / justify
Designed Systems
Are intentionally designed and revised over time
Exist to fulfil a purpose / meet a need
Can usually be explained / justified
Trying to explain a system to a new hire is a good litmus test of which type of system it is
Designed Solutions
Specialization
Replace heroes with roles
Tools
Remove friction from unscalable systems / processes
Processes
Bring order to the chaos of the herd
Example: Organic Support
People email us with issues
The email goes to the whole tech/product team (herd)
Whoever looks first (usually superhero Alice) responds
If it’s a bug they assign it to the person responsible for the bug (herd knowledge)
Example: Designed Support
Support tickets go into a queue in our support system (tool) that is triaged by our support team (specialization)
The team triages and follows a defined escalation and response procedure (process)
If the ticket is a bug it is put in a different queue (tool/process) and is picked up according to priority by one of our rotating support engineers (specialization / process)
Example: Organic Code Review
Make a pull request and whoever (herd) gets to it (usually superhero Bob) will review and merge or make comments
If no one reviews it after a few days you can start pinging people (herd) on slack
Example: Designed Code Review
Every developer will be assigned a code buddy who is responsible for reviewing your pull requests (process)
Additionally, you should ping the defined area owner of that part of the codebase (specialization) for a secondary review.
Pull requests should only be merged after reviews have been approved (process) and all tests pass (tool)
Culture: Organic → Designed
Much like other things, your culture will also be determined organically based on your founding team and leadership
As you scale you want to shift to a more designed culture as well, else you will maintain the good and bad qualities
Culture will change, so your job is to let it change in a positive way
Example: Organic Culture
We hire people who are like us in background / perspective / race / gender / etc. because that’s what’s worked in the past
We all socialize together outside of work because we always have
We critique each other’s work openly, harshly, and regularly because that’s what the CEO and CTO did
Example: Designed Culture
We aspire to be a diverse organization and so we work hard to remove bias from our interview process and attract candidates from all walks of life
What happens at work is what’s important, what you choose to do outside of work is up to you
We work hard to have supportive conversations that address the real problems without hurting people
Rough Guidelines
Execute for the short term
Plan for the long term
Design to be changed
Most Short Term Work
Most Long Term Scalable
Least Short Term Work
Least Long Term Scalable
Questions?
www.coryzue.com, @czue
www.dimagi.com (we’re hiring!)