What is Machine Learning?
Outline
Is machine learning the same as AI?
Artificial
Intelligence
Not to scale
Machine
Learning
Machine
Learning
Why is machine learning so exciting?
Find patterns in data that are too complicated for a human to detect.
Find solutions to problems that are difficult to solve with traditional programming.
Why is machine learning gaining popularity?
What is machine learning?
Computers learn to solve problems without being explicitly programmed.
Known
Input
Known Output
Model
First, training:
Input
Learned
Model
Output
Then, evaluation:
Input
Coded
Logic
Output
Classic computer science
Input
Learned
Model
Output
Machine learning
Given temperature, can we predict rain or snow?
?
How would you solve the problem?
Computer needs to figure it out from data.
Recorded past weather data.
Temperature in °F | Precipitation |
27 | Snow |
52 | Rain |
65 | Rain |
10 | Snow |
71 | Rain |
... | ... |
Training the model
27 | Snow |
52 | Rain |
65 | Rain |
10 | Snow |
71 | Rain |
17 | Snow |
... | ... |
variable | value |
min temp | |
max temp | |
current temp | |
current error | |
best predictor temp so far | |
lowest error so far | |
Training state
Training data
Trained model
27 | Snow |
52 | Rain |
65 | Rain |
10 | Snow |
71 | Rain |
17 | Snow |
... | ... |
variable | value |
min temp | -1 |
max temp | 91 |
current temp | 91 |
current error | 0.66 |
best predictor temp so far | 34 |
lowest error so far | 0.03 |
Training state
Training data
What did the model learn?
What does the model do with new data?
34°F
Temperature
39°F?
The model predicts snow or rain based on temperature!
Some terminology
A machine learning algorithm uses the features and labels in training data to train a model.
The model can then be used to evaluate new data to predict labels.
Term | Example |
Features | Temperature |
Labels | Rain/Snow |
Model | Snow < 34° < Rain |
Training data | Table of temperatures and precipitation type |
Training | Try every temperature to see if it produces the most accurate results |
Evaluation | Compare new input to learned temperature, predict rain or snow |
Features, labels, model, prediction
Features:
Temperatures
Labels:
Rain/Snow
Prediction:
Rain!
Model:
Snow < 34° < Rain
Input:
39°
Model:
Snow < 34° < Rain
Training:
Evaluation:
More features? More dimensions.
Types of training
Supervised Learning
Model learns based on knowing the correct output for given input.
Unsupervised Learning
Model learns based only on the input.
Other training techniques
Semi-Supervised Learning
Reinforcement Learning
What can these models do?
Classification
Predict which category input belongs in.
Example uses:
Regression
Predict continuous, numeric output based on input.
Example uses:
Clustering
Locate naturally occurring groups within the input data.
Example uses
!?
How do we go from dots to the real world?
How stellar is the weather today, am I right?!
Is this message Happy or Sad?
Cat
Cat
Cat
Dog
Ewok
Images as numeric input
Words as numeric input
Words as numeric input
Puppy (2,4)
Kitten (4,4)
Cat (3,1)
Dog (1,1)
Puppy - Dog + Cat = Kitten
(1,3)
Neural networks
Hidden layers combine inputs
27°
60%
30m
...
...
...
...
Snow
Temperature
Humidity
Altitude
Input
Hidden Layer
Output
Each layer solves part of the problem
Deep Learning networks have lots of layers
Challenges
What can go wrong: Sparse data
Insufficient training data for model to learn.
°F | Precipitation |
47 | Rain |
52 | Rain |
65 | Rain |
38 | Rain |
... | ... |
What can go wrong: Overfitting
Model matches the noise instead of the signal in the training data.
What can go wrong: Biased data
Incomplete or non-representative training data results in skewed predictions.
What can go wrong: Unintended proxy variables
Keying off an unintended signal via correlated data.
Current challenges
Labeled data is limited.
ML models are highly specialized to domain of training data.
Opaque ML models make it hard to determine if predictions are biased.
Experimental results are often not reproducible.
Wrap-up
Industry roles
Data scientist
Data engineer
Software engineer
QA engineer for models
Data annotator
Domain expert
Computational linguist
AI/ML researcher
Ethicist
Lawyer
Marketer
Social media strategist
Public relations specialist
Product manager
Designer
HCI/UX expert
...