1 of 29

Lessons from reluctant data engineering

2 of 29

Disclaimer #1: I am not a data engineer

3 of 29

Data engineering: More valuable than data science?

4 of 29

Why am I here today?

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first committed climate and biodiversity moves

2022

My first data science job

2012

5 of 29

Disclaimer #2: Lessons may be obvious or better learnt the hard way

6 of 29

Snippet 1/5

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first data science job

2012

My first committed climate and biodiversity moves

2022

7 of 29

2012

My first data science job

8 of 29

“Big Launch” prep with shiny new tech

  • Wild old days of the Facebook API
  • Giveable used API to recommend gifts
  • Data Science work included plenty of engineering
  • Shiny new tech: MongoDB
  • Hit many early MongoDB snags: DB-level locks + various bugs
  • Some issues were self-inflicted (sharding...)

9 of 29

Lesson: Shiny tech ain’t always shiny

We should forget about small efficiencies, say about 97% of the time:

premature optimization�is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%.

– Donald Knuth, 1974

10 of 29

Snippet 2/5

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first data science job

2012

My first committed climate and biodiversity moves

2022

11 of 29

2013

My first head of data science job

12 of 29

Real launch leads to real scaling problems

  • Head of Data Science work included plenty of engineering
  • Hynt started getting real traffic in Dec 2013
  • Database recommendation tracking meant imminent explosion
  • Good timing:
    • Nov 2013: AWS Kinesis went live
    • Mid-Dec 2013: Jay Kreps published The Log (aka Kafka principles post)
    • Jan 2014: Crisis averted with event logger based on MongoDB + S3

13 of 29

Lesson: Shiny tech can be transformative; but principles beat tools

As to methods, there may be a million and then some, but principles are few.

The person who grasps principles can successfully select their own methods.

The person who tries methods, ignoring principles, is sure to have trouble.

– Harrington Emerson, 1911

14 of 29

Snippet 3/5

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first data science job

2012

My first committed climate and biodiversity moves

2022

15 of 29

2015

My first enterprise consulting stint

16 of 29

Enterprise-scale blockers; human-scale workarounds

  • Data Science with team of 20+ DS & 100+ DE
    • ...still, plenty of engineering needed to get things done
  • Getting nothing done was also fine:
    • First month: No data access
    • Subsequent months: More managers than workers
    • Then: New project, meetings, and experiment live in production (!!!)
  • Some DEs seemed to favour functional over useful programming
    • Great people; bad incentives & structures
    • Workaround: Build my own Python pipeline

17 of 29

Lesson: Solve problems; don’t be the problem

Focus on the user and all else will follow.

– Google, ~2004

18 of 29

Snippet 4/5

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first data science job

2012

My first committed climate and biodiversity moves

2022

19 of 29

2017

My first remote data science job

20 of 29

Automattic: Remote company; normal data problems

  • 4.5 years of “data science” with choose-your-own-adventure titles
    • Analytics engineering?
    • Machine learning engineering?
    • Causal inference tech lead?
  • Being able to speak & do eng/ops helped get sci to prod
  • Went deep into many data rabbit holes

21 of 29

Lesson: Go deep; trust but verify

Given enough eyeballs, all bugs are shallow.

– Eric S. Raymond, 1999

22 of 29

Snippet 5/5

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first committed climate and biodiversity moves

2022

My first data science job

2012

23 of 29

2022

My first committed climate and biodiversity moves

24 of 29

Recent highlights

  • 90% engineering?
  • Recently: Lead DS / Tech Lead with renewable energy startup
  • Techie / DE volunteer: Reef Life Survey & Work on Climate
  • Currently: Data & AI freelancing / own product work
  • Luxury of exploration

25 of 29

Lesson: Tech & titles are tools; focus on what matters

You are not obliged to complete the work, but neither are you free to desist from it.

– Rabbi Tarfon, ~100

26 of 29

Why am I here today?

My first data engineering conference talk

2023

My first remote data science job

2017

My first enterprise consulting stint

2015

My first head of data science job

2013

My first committed climate and biodiversity moves

2022

My first data science job

2012

27 of 29

Takeaway:

Data problems have human roots – and human solutions

28 of 29

Recap: Data problems have human roots & solutions

  • Humans get excited by shiny tech
    • ...and produce transformative tech
  • Humans optimise prematurely
    • ...and when it makes sense
  • Humans can act as unreasonable blockers
    • ...and as the users we serve
  • Humans generate messy data
    • ...and clean it up
  • Humans get distracted by tools
    • ...and use them for beneficial ends

29 of 29

Questions?