1 of 42

2 of 42

What is Machine Learning?

3 of 42

Outline

  • Why machine learning?
  • What is machine learning?
    • How does it work?
    • How can we train it?
    • What can it do?
  • Going to the real world
  • Challenges
  • Wrap-up

4 of 42

Is machine learning the same as AI?

Artificial

Intelligence

Not to scale

Machine

Learning

Machine

Learning

5 of 42

Why is machine learning so exciting?

Find patterns in data that are too complicated for a human to detect.

Find solutions to problems that are difficult to solve with traditional programming.

6 of 42

Why is machine learning gaining popularity?

  • Solves hard problems
  • More accessible all the time
    • More data
    • More compute resources
    • Better tools

7 of 42

What is machine learning?

Computers learn to solve problems without being explicitly programmed.

Known

Input

Known Output

Model

First, training:

Input

Learned

Model

Output

Then, evaluation:

Input

Coded

Logic

Output

Classic computer science

Input

Learned

Model

Output

Machine learning

8 of 42

Given temperature, can we predict rain or snow?

?

9 of 42

How would you solve the problem?

10 of 42

Computer needs to figure it out from data.

Recorded past weather data.

Temperature in °F

Precipitation

27

Snow

52

Rain

65

Rain

10

Snow

71

Rain

...

...

11 of 42

Training the model

  1. Find min and max temps.
  2. For each value between min and max temps, calculate how accurately it separates “rain” temps from “snow” temps.
  3. Return best predictor temp and its accuracy.

27

Snow

52

Rain

65

Rain

10

Snow

71

Rain

17

Snow

...

...

variable

value

min temp

max temp

current temp

current error

best predictor temp so far

lowest error so far

Training state

Training data

12 of 42

Trained model

  • Find min and max temps.
  • For each value between min and max temps, calculate how accurately it separates “rain” temps from “snow” temps.
  • Return best predictor temp and its accuracy.

27

Snow

52

Rain

65

Rain

10

Snow

71

Rain

17

Snow

...

...

variable

value

min temp

-1

max temp

91

current temp

91

current error

0.66

best predictor temp so far

34

lowest error so far

0.03

Training state

Training data

13 of 42

What did the model learn?

14 of 42

What does the model do with new data?

34°F

Temperature

39°F?

The model predicts snow or rain based on temperature!

15 of 42

Some terminology

A machine learning algorithm uses the features and labels in training data to train a model.

The model can then be used to evaluate new data to predict labels.

Term

Example

Features

Temperature

Labels

Rain/Snow

Model

Snow < 34° < Rain

Training data

Table of temperatures and precipitation type

Training

Try every temperature to see if it produces the most accurate results

Evaluation

Compare new input to learned temperature, predict rain or snow

16 of 42

Features, labels, model, prediction

Features:

Temperatures

Labels:

Rain/Snow

Prediction:

Rain!

Model:

Snow < 34° < Rain

Input:

39°

Model:

Snow < 34° < Rain

Training:

Evaluation:

17 of 42

More features? More dimensions.

  • Temperature
  • Temperature
  • Humidity
  • Temperature
  • Humidity
  • Altitude

18 of 42

Types of training

  • Supervised
  • Unsupervised
  • Semi-supervised
  • Reinforcement

19 of 42

Supervised Learning

Model learns based on knowing the correct output for given input.

  • Weather data
  • Mail labeled as spam
  • Labeled photos of animals

20 of 42

Unsupervised Learning

Model learns based only on the input.

  • Purchase data
  • Movie preference
  • Crime reports

21 of 42

Other training techniques

Semi-Supervised Learning

Reinforcement Learning

22 of 42

What can these models do?

  • Classification
  • Regression
  • Clustering

23 of 42

Classification

Predict which category input belongs in.

Example uses:

  • Rain or snow prediction
  • Spam detection
  • Object detection in images

24 of 42

Regression

Predict continuous, numeric output based on input.

Example uses:

  • Predict property value based on sales of similar homes
  • Project future spending based on past spending

25 of 42

Clustering

Locate naturally occurring groups within the input data.

Example uses

  • Security anomaly detection
  • Movie recommendations
  • Clique analysis in social networks

!?

26 of 42

How do we go from dots to the real world?

How stellar is the weather today, am I right?!

Is this message Happy or Sad?

Cat

Cat

Cat

Dog

Ewok

27 of 42

Images as numeric input

28 of 42

Words as numeric input

29 of 42

Words as numeric input

Puppy (2,4)

Kitten (4,4)

Cat (3,1)

Dog (1,1)

Puppy - Dog + Cat = Kitten

(1,3)

30 of 42

Neural networks

31 of 42

Hidden layers combine inputs

27°

60%

30m

...

...

...

...

Snow

Temperature

Humidity

Altitude

Input

Hidden Layer

Output

32 of 42

Each layer solves part of the problem

33 of 42

Deep Learning networks have lots of layers

34 of 42

Challenges

  • Sparse data
  • Overfitting
  • Biased data
  • Unintended proxy variables

35 of 42

What can go wrong: Sparse data

Insufficient training data for model to learn.

°F

Precipitation

47

Rain

52

Rain

65

Rain

38

Rain

...

...

36 of 42

What can go wrong: Overfitting

Model matches the noise instead of the signal in the training data.

37 of 42

What can go wrong: Biased data

Incomplete or non-representative training data results in skewed predictions.

38 of 42

What can go wrong: Unintended proxy variables

Keying off an unintended signal via correlated data.

39 of 42

Current challenges

Labeled data is limited.

ML models are highly specialized to domain of training data.

Opaque ML models make it hard to determine if predictions are biased.

Experimental results are often not reproducible.

40 of 42

Wrap-up

  • Industry roles
  • Machine learning vs Artificial intelligence?

41 of 42

Industry roles

Data scientist

Data engineer

Software engineer

QA engineer for models

Data annotator

Domain expert

Computational linguist

AI/ML researcher

Ethicist

Lawyer

Marketer

Social media strategist

Public relations specialist

Product manager

Designer

HCI/UX expert

...

42 of 42

More to read and play with