1 of 52

Growing Pains

Cory Zue

@czue

2 of 52

This presentation works better in presentation mode (click “Present” above or press Ctrl-F5)

3 of 52

Who am I?

2007-2017

CTO, Dimagi

Today

“Chief Accelerator”

Builder / Entrepreneur

Freelance Consultant

Emoji-Lover

4 of 52

Why this talk?

Growing is hard.

Especially from 1-100 people (or servers).

It’s probably hard past 100 too, but I don’t know much about that.

5 of 52

Three Talks Parts

Scaling Code

Scaling your Stack

Scaling Teams

6 of 52

Scaling Code: Fighting Complexity

7 of 52

A codebase

8 of 52

A codebase

  • Interdependencies scale exponentially
  • No longer fits in one person’s head

9 of 52

A codebase

  • Takes longer to reason with
  • Harder / slower to test things
  • Longer to onboard new people

10 of 52

Enter: abstraction

  • Reduce the size of any single component to be able to fit in one person’s head
  • Allows for specialization, if necessary

System

Glue

Subsystem

Subsystem

Subsystem

11 of 52

A more sustainable codebase

12 of 52

A more sustainable codebase

13 of 52

A more sustainable codebase

14 of 52

A more sustainable codebase

15 of 52

How to abstract?

No structure

“Organized” monolith

Modules define public APIs via __init__.py

Shared modules using git submodules

Shared libraries using PyPI

Service Oriented Architecture

Most Short Term Work

Most Long Term Scalable

Least Short Term Work

Least Long Term Scalable

16 of 52

Rough Guidelines

Execute for the short term (< 1 year)

Plan for the long term (1-3 years)

Design to be changed (always)

Most Short Term Work

Most Long Term Scalable

Least Short Term Work

Least Long Term Scalable

17 of 52

All the things!

A few other quick tips

18 of 52

Clean all the things!

Rule: Always leave code cleaner than when you found it

Flake CI

Conduct refactorathons

Resource tech debt

19 of 52

Review all the things!

Code review is critical

Conduct 10% design reviews for big features

20 of 52

(Automated) Test all the things!

Test while coding

Write failing tests before fixing bugs

Build a CI pipeline

21 of 52

Scaling Your Stack: Fighting Scale

22 of 52

Django

Web Server

Database

Database

Python App Code

Web Server

23 of 52

Database

Database / 3rd Party

Django

Database

Cache

The site is getting slow!

Python App Code

Web Server

24 of 52

Django

Database

Cache

We need background processing!

Background Processor

Celery

(background processor)

Python App Code

Web Server

Database / 3rd Party

25 of 52

Celery

Celery

Django

Django

Django

Database

Cache

The machines can’t keep up with the load!

Background Processor

Celery

(background processor)

Python App Code

Web Server

Database / 3rd Party

26 of 52

Celery

Celery

Django

Django

Django

Database

Cache

The database can’t keep up with the load!

Background Processor

Celery

(background processor)

Database

(Read Replica(s))

Python App Code

Web Server

Database / 3rd Party

27 of 52

Stream Processing

Stream Processing

Celery

Celery

Django

Django

Django

Database

Cache

Our reports are taking way too long!

Background Processor

Celery

(background processor)

Database

(Read Replica(s))

Analytics Database

Stream Processing

Streaming Platform

Python App Code

Web Server

Database / 3rd Party

28 of 52

Stream Processing

Stream Processing

Celery

Celery

Django

Django

Django

Database

Cache

It’s the type of data in the database!

Background Processor

Celery

(background processor)

Database

(Read Replica(s))

Analytics Database

Stream Processing

“Blob” Database

Streaming Platform

Python App Code

Web Server

Database / 3rd Party

29 of 52

Stream Processing

Stream Processing

Celery

Celery

Django

Django

Django

Databases

Cache

The data is just too big!

Background Processor

Celery

(background processor)

Database

(Read Replica(s))

Analytics Database

Stream Processing

“Blob” Database

Database

Streaming Platform

Python App Code

Web Server

Database / 3rd Party

30 of 52

Rough Guidelines

Execute for the short term (< 1 year)

Plan for the long term (1-3 years)

Design to be changed (always)

Most Short Term Work

Most Long Term Scalable

Least Short Term Work

Least Long Term Scalable

31 of 52

All the things!

A few other quick tips

32 of 52

Learn all the things!

Build DevOps capacity

Get to know your tools well. Really well.

33 of 52

Monitor all the things!

Uptime → Pingdom, Status Cake, others

Servers → New Relic, Datadog, Munin, others

34 of 52

Record all the things!

Logs

ELK (ElasticSearch, Logstash, Kibana)

Datadog

35 of 52

Orchestrate all the things!

Ansible (Salt, Fabric, Puppet/Chef)

36 of 52

Scaling Teams: Fighting Chaos

37 of 52

Two Common Types of Organic, Small Team Growth

A single person (or small group of people) owns something.

Common examples:

  • Code review
  • People management
  • Support / firefighting

Superhero Model

38 of 52

Two Common Types of Organic, Small Team Growth

The “herd” (aka everyone on the team) has distributed, self-organizing ownership

Common examples:

  • Knowledge management
  • Bug fixes
  • Who does what

Herd Model

39 of 52

How this works over time

40 of 52

Superheroes and Growth

41 of 52

Herds and Growth

42 of 52

Organic → Designed

Organic Systems

Happen unintentionally over a long period of time

Exist because “that’s how it’s always worked”

Can often be hard to explain / justify

Designed Systems

Are intentionally designed and revised over time

Exist to fulfil a purpose / meet a need

Can usually be explained / justified

Trying to explain a system to a new hire is a good litmus test of which type of system it is

43 of 52

Designed Solutions

Specialization

Replace heroes with roles

Tools

Remove friction from unscalable systems / processes

Processes

Bring order to the chaos of the herd

44 of 52

Example: Organic Support

People email us with issues

The email goes to the whole tech/product team (herd)

Whoever looks first (usually superhero Alice) responds

If it’s a bug they assign it to the person responsible for the bug (herd knowledge)

45 of 52

Example: Designed Support

Support tickets go into a queue in our support system (tool) that is triaged by our support team (specialization)

The team triages and follows a defined escalation and response procedure (process)

If the ticket is a bug it is put in a different queue (tool/process) and is picked up according to priority by one of our rotating support engineers (specialization / process)

46 of 52

Example: Organic Code Review

Make a pull request and whoever (herd) gets to it (usually superhero Bob) will review and merge or make comments

If no one reviews it after a few days you can start pinging people (herd) on slack

47 of 52

Example: Designed Code Review

Every developer will be assigned a code buddy who is responsible for reviewing your pull requests (process)

Additionally, you should ping the defined area owner of that part of the codebase (specialization) for a secondary review.

Pull requests should only be merged after reviews have been approved (process) and all tests pass (tool)

48 of 52

Culture: Organic → Designed

Much like other things, your culture will also be determined organically based on your founding team and leadership

As you scale you want to shift to a more designed culture as well, else you will maintain the good and bad qualities

Culture will change, so your job is to let it change in a positive way

49 of 52

Example: Organic Culture

We hire people who are like us in background / perspective / race / gender / etc. because that’s what’s worked in the past

We all socialize together outside of work because we always have

We critique each other’s work openly, harshly, and regularly because that’s what the CEO and CTO did

50 of 52

Example: Designed Culture

We aspire to be a diverse organization and so we work hard to remove bias from our interview process and attract candidates from all walks of life

What happens at work is what’s important, what you choose to do outside of work is up to you

We work hard to have supportive conversations that address the real problems without hurting people

51 of 52

Rough Guidelines

Execute for the short term

Plan for the long term

Design to be changed

Most Short Term Work

Most Long Term Scalable

Least Short Term Work

Least Long Term Scalable

52 of 52

Questions?

www.coryzue.com, @czue

www.dimagi.com (we’re hiring!)