Important Takeaways on using ML/AI in Practice
Rayid Ghani
Takeaways
All Your Modeling Decisions
Should Reflect How
The AI System Will Be Used
What is your goal (it’s not to build a model)? Constraints?
What are you trying to generalize to?
Who/what is this model going to be applied to? When?
What’s the right metric for “accuracy” (rarely is this AUC or F1)?
What does fairness mean in this context?
Remember to Think
4th-Dimensionally
(Time Matters)
When is your prediction being made?
What information is available for features at that time? What isn’t?
How frequently will the model be updated?
Over what time horizon is your label/outcome occurring?
Leave time to collect labels/outcomes between training and validation
Data != Matrix
Data doesn’t come with labels, you have to create them
Data doesn’t come with features, you have to construct them
Rows in the raw data are rarely the rows in the training/validation matrix
The Defaults Are
Mostly Bad
(aka Models Don’t Give 0/1 Labels)
Models give scores not predicted classes, and these are rarely probabilities
There’s no such thing as absolute accuracy, precision, or recall
A 0.5 score threshold is almost never what you want
Hyperparameters matter and defaults in many packages aren’t great
Compared To What?
What is the right baseline that your model needs to beat to be useful? By how much?
How is this decision currently made (or what commonsense, non-ML approach could be used)?
What is the performance of the current, human-driven decision-making process? How fair is it?
Do explainability methods give you more information than crosstabs and feature importances?
Be Skeptical
Many published results are overstated (at best), not tested on real-world uses
Try things out for yourself, understand their limitations in your context/use case
Interpretability is far from a solved problem. So is algorithmic fairness.
… or for that matter, generalizing to the future, top k classification, etc.
Models Encode Values
Be explicit about what you want the model to do
… and validate that the model actually does this
Understand (and seek out) the perspectives of different stakeholders, people affected by model
Goal is to have a fair overall system, not just a fair model
Recap
Scoping
Data Acquisition
Data Storage
Data Linkage
Data Exploration
Analytical Formulation
ML Pipelines
Baselines
Feature Generation
Train Test Splits
Evaluation Metrics
Modeling
Model Selection
Interpretability
Bias/Fairness
Field Trials
Deployment
Maintenance & Monitoring