Process and Technical Debt
Machine Learning in Production
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Process...
2
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Readings
Required Reading:
Suggested Readings:
3
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Learning Goals
4
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What is Process?
5
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Software Process
“The set of activities and associated results that produce a software product”
A structured, systematic way of carrying out these activities
Q. Examples?
6
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Example of Process Activities?
7
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Developers dislike processes
8
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What does a developer’s day look like?
9
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
10
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
11
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Developers' view of processes
12
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What developers want
13
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What developers want
14
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What developers think of processes
15
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What eventually happens anyway
16
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Hypothesis: Process increases flexibility and efficiency + Upfront investment for later greater returns
17
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Survival Mode
Missed deadlines -> "solo development mode" to meet own deadlines
Ignore integration work
Stop interacting with testers, technical writers, managers, ...
-> Results in further project delays, added costs, poor product quality...
18
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Example of Process Problems?
19
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Example: Healthcare.gov
20
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Example: Healthcare.gov
21
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
22
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Case Study: Real Estate Website
23
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
ML Component: Predicting Real Estate Value
Given a large database of house sales and statistical/demographic data from public records, predict the sales price of a house.
24
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What's your process?
Q. What steps would you take to build this component?
25
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Exploratory Questions
26
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Time estimation
27
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Time estimation
28
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Hofstadter’s Law
29
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Is Estimation Evil?
30
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science: Iteration and Exploration
31
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science is Iterative and Exploratory
32
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science is Iterative and Exploratory
33
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science is Iterative and Exploratory
34
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science is Iterative and Exploratory
35
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science is Iterative and Exploratory
Science mindset: start with rough goal, no clear specification, unclear whether possible
Heuristics and experience to guide the process
Try and error, refine iteratively, hypothesis testing
Go back to data collection and cleaning if needed, revise goals
36
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Different Trajectories
37
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Computational Notebooks
38
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Notebooks Support Iteration and Exploration
Quick feedback, similar to REPL
Visual feedback including figures and tables
Incremental computation: reexecuting individual cells
Quick and easy: copy paste, no abstraction needed
Easy to share: document includes text, code, and results
39
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Share Experience?
40
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Brief Discussion: Notebook Limitations and Drawbacks?
41
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Different Trajectories
42
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Software Process Models
43
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Ad-hoc Processes
44
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Waterfall Model
Understand requirements, plan & design before coding, test & deploy
45
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Looks like mass manufacturing?
46
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Problems with Waterfall?
47
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Waterfall Model
Understand requirements, plan & design before coding, test & deploy
48
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Risk First: Spiral Model
Incremental prototypes, starting with most risky components
49
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Constant iteration: Agile
50
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
51
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Selecting Process Models
Individually, vote in #lecture slack: [1] Ad-hoc [2] Waterfall [3] Spiral [4] Agile
52
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science vs Software Engineering
53
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Discussion: Iteration in Notebook vs Agile?
54
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Poor Software Engineering Practices in Notebooks?
55
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Understanding Data Scientist Workflows
Instead of blindly recommended "SE Best Practices" understand context
Documentation and testing not a priority in exploratory phase
Help with transitioning into practice
56
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Data Science Practices by Software Eng.
57
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
58
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Integrated Process for AI-Enabled Systems
59
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
60
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
61
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Recall: ML models are system components
62
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
63
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
64
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
65
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Process for AI-Enabled Systems
66
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Trajectories
Not every project follows the same development process, e.g.
Different focus on system requirements, qualities, and upfront planning
Manage interdisciplinary teams and different expectations
67
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Technical debt
68
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Technical Debt Metaphor
Analogy to financial debt
Ideally, a deliberate decision (short term tactical or long term strategic)
Ideally, track debt and plan for paying it down later
Q. Examples?
69
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
What causes technical debt?
70
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
71
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
72
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Technical Debt: Examples
Prudent & deliberate: Skip using a CI platform
Reckless & inadvertent: Forget to encrypt user credentials in DB
73
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Breakout: Technical Debt from ML
As a group in #lecture, tagging members: Post two plausible examples technical debt in housing price prediction system:
74
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Technical Debt through Notebooks?
Jupyter Notebooks are a gift from God to those who work with data. They allow us to do quick experiments with Julia, Python, R, and more -- John Paul Ada
75
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
ML and Technical Debt
Often reckless and inadvertent in inexperienced teams
ML can seem like an easy addition, but it may cause long-term costs
Needs to be maintained, evolved, and debugged
Goals may change, environment may change, some changes are subtle
76
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Example problems: ML and Technical Debt
77
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Controlling Technical Debt from ML Components
78
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Controlling Technical Debt from ML Components
79
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
80
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Summary
Data scientists and software engineers follow different processes
ML projects need to consider process needs of both
Iteration and upfront planning are both important, process models codify good practices
Deliberate technical debt can be good, too much debt can suffocate a project
Easy to amount (reckless) technical debt with machine learning
81
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Further Reading
82
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025
Further Reading 2
83
Machine Learning in Production/AI Engineering · Claire Le Goues & Austin Henley · Spring 2025