Data Engineering (AI5308/AI4005)
Apr 27: Model Development and Offline Evaluation (Ch. 6)
Sundong Kim
Course website: https://sundong.kim/courses/dataeng23sp/�Contents from CS 329S (Chip Huyen, 2022) | cs329s.stanford.edu
Get MLOps Certified With The Course From Weights & Biases: Visit http://wandb.me/MLOps-Course (3 hrs)
Today: Skim through https://docs.google.com/presentation/d/1aMrSJyUqS5hkT_2Ljs9nhOEt2zSS7fe5u9jn_HfcUis/edit?usp=sharing (15 min)
Useful Tools - W&B for exp tracking (a.k.a. wandb)
Bringing machine learning models to production is challenging, with a continuous iterative lifecycle that consists of many complex components. Having a disciplined, flexible and collaborative process - an effective MLOps system - is crucial to enabling velocity and rigor, and building an end-to-end machine learning pipeline that continually delivers production-ready ML models and services.
4
Useful Tools - W&B for exp tracking (a.k.a. wandb)
Integrate Weights & Biases with PyTorch: https://www.youtube.com/watch?v=G7GH0SeNBMA
5
Useful Tools - MLFlow for your ML lifecycle
See Youtube Video:
https://www.youtube.com/watch?v=VokAGy8C6K4
Try MLFlow Tutorial (spend 3-4 hours on this): https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html
For advanced users: see PyCON video (Korean): �https://www.youtube.com/watch?v=H-4ZIfOJDaw
6
7
Self-study these tools - by May 17
Project Mid Review (May 2)
See this page and submit the form when ready. https://sundong.kim/courses/dataeng23sp/final-project/#mid-review
8
Team 10’s Presentation
9
Critique Scoring Process
Critique 1
0: 2
2: 3
3: 28
4: 6
5: 7�(Avg: 3.38)
Critique 2
0: 3
2: 2
3: 14
4: 9
5: 18�(Avg: 3.82)
TA (Hongyiel) !!
Possible Discussion
12
Possible Discussion
→ Search A/B test and see Ronny Kohavi’s video: https://exp-platform.com/Documents/2015-08OnlineControlledExperimentsKDDKeynoteNR.pdf �https://www.youtube.com/watch?v=HEGI5QN3fXE (next page)
13
A/B Testing Pitfalls: Getting Numbers You Can Trust is Hard
14
Possible Discussion
15
Possible Discussion
16
Possible Discussion
17
Possible Discussion
18
Tons of applied data science papers
19
Model selection
20
ML algorithm
21
6 tips for evaluating ML algorithms
22
23
24
2. Start with the simplest models
25
2. Start with the simplest models
26
3. Avoid human biases in selecting models
27
3. Avoid human biases in selecting models
28
3. Avoid human biases in selecting models
29
Better now vs. better later
30
Learning curve
31
Good for estimating if performance can improve with more data
5. Evaluate trade-offs
32
6. Understand your model’s assumption
33
6 tips for evaluating ML algorithms
34
Ensembles
35
Ensemble
36
Base learners
Ensembles: extremely common in leaderboard style projects
37
Why does ensembling work?
38
Why does ensembling work?
39
Outputs of 3 models | Probability | Ensemble’s output |
All 3 are correct | 0.7 * 0.7 * 0.7 = 0.343 | Correct |
Only 2 are correct | (0.7 * 0.7 * 0.3) * 3 = 0.441 | Correct |
Only 1 is correct | (0.3 * 0.3 * 0.7) * 3 = 0.189 | Wrong |
None is correct | 0.3 * 0.3 * 0.3 = 0.027 | Wrong |
Why does ensembling work?
40
Ensemble
41
Bagging
42
Illustration by Sirakorn
Bagging
43
Illustration by Sirakorn
Bagging Predictors (Leo Breiman, 1996)
Boosting
44
Illustration by Sirakorn
Boosting
45
Illustration by Sirakorn
Extremely popular:
Stacking
46
Majority vote, logistic regression, simple NN
Data Engineering
Next class: Model Development and Offline Evaluation (Ch. 6)
https://sundong.kim/courses/dataeng23sp/ | Sundong Kim