1 of 51

Neural Networks

Unit 3, Module 3.5

1

2 of 51

A lot of the AI applications you’ve used have neural networks inside of them.

This is the dominant technology today for many areas of AI.

2

3 of 51

Only in the last 15 years have neural nets become good enough to have practical uses…. Let’s look at where they started

3

Cue this and add URL

4 of 51

Decision Trees vs. Neural Networks

We learned how to build decision trees. They are useful for some types of problems.

  • Decision trees require less data to train.
  • Decision trees can “explain” their conclusions in terms of features we understand.

But for really hard problems, neural networks work better. Why?

Decision trees depend on having a good feature set. What if we don’t?

Neural networks can create their own features. This allows them to come up with sophisticated solutions to hard problems, such as understanding speech or images, that decision trees can’t handle.

4

5 of 51

Why are they called “neural” networks?

  • The human brain contains roughly 86 billion neurons wired together in intricate patterns. We can draw an analogy with deep neural networks.�
  • But biological neurons are more complex than linear threshold units, and real brain wiring is more complex than artificial neural networks.

5

6 of 51

Beware the “hype” around neural networks

  • Historically, neural networks were inspired by by ideas about how neurons in the brain might work.
  • But we don’t actually know how our brains reason, learn, imagine, etc. So…

6

Anyone who tells you that neural nets “work the way the brain does” is wrong. They have bought the hype.

Don’t buy the hype.

7 of 51

Neural Networks in a Minute

7

URL (time)

8 of 51

What is a neural network?

  • A large, complex mathematical function that maps inputs to outputs.
    • Because the function is so complex, the network can do sophisticated things.�
  • Each layer is composed of many simple functions, called “units” or “neurons”.�������
  • Each unit takes multiple numbers as input and produces a number as output.

8

units

weighted connections

a number

a number

a number

a number

a number

9 of 51

Neural Networks Are Organized In Layers

9

Input�Layer

Hidden�Layer

Output�Layer

10 of 51

Deep Neural Networks Have Many Layers

10

11 of 51

Let’s focus on how one neuron learns

11

12 of 51

Learning to predict dog bites

12

1 vote

1 vote

1 vote

1 vote

1 vote

1 vote

1 vote

Start with a bunch of “experts”. This is our input layer.

Some experts are good for predicting dog bites.

But which are the good ones?

Every expert starts out with 1 vote.

13 of 51

Is this dog going to bite me? (1)

13

1 vote

1 vote

1 vote

1 vote

1 vote

1 vote

1 vote

NO

NO

NO

NO

YES

YES

YES

Tally

No Yes

4 votes 3 votes

14 of 51

Is this dog going to bite me? (1)

14

1.1 votes

1.1 votes

1.1 votes

1.1 votes

0.9 votes

0.9 votes

0.9 votes

NO

NO

NO

NO

YES

YES

YES

Doesn’t bite.

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

15 of 51

Is this dog going to bite me? (2)

15

1.1 votes

1.1 votes

1.1 votes

1.1 votes

0.9 votes

0.9 votes

0.9 votes

NO

NO

YES

YES

YES

NO

NO

Tally

No Yes

4.0 votes 3.1 votes

16 of 51

Is this dog going to bite me? (2)

16

1.0 votes

1.0 votes

1.2 votes

1.2 votes

1.0 votes

0.8 votes

0.8 votes

NO

NO

YES

YES

YES

NO

NO

Does bite!

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

17 of 51

Is this dog going to bite me? (3)

17

1.0 votes

1.0 votes

1.2 votes

1.2 votes

1.0 votes

0.8 votes

0.8 votes

NO

YES

NO

NO

NO

YES

NO

Tally

No Yes

5.2 votes 1.8 votes

18 of 51

Is this dog going to bite me? (3)

18

1.1 votes

0.9 votes

1.3 votes

1.3 votes

1.1 votes

0.7 votes

0.9 votes

NO

YES

NO

NO

NO

YES

NO

Doesn’t bite

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

19 of 51

Is this dog going to bite me? (4)

19

1.1 votes

0.9 votes

1.3 votes

1.3 votes

1.1 votes

0.7 votes

0.9 votes

YES

NO

YES

YES

YES

NO

NO

Tally

No Yes

2.5 votes 4.8 votes

20 of 51

Is this dog going to bite me? (4)

20

1.2 votes

0.8 votes

1.4 votes

1.4 votes

1.2 votes

0.6 votes

0.8 votes

YES

NO

YES

YES

YES

NO

NO

Does bite!

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

21 of 51

Turning our dog bite predictor into a neural network

21

Σ>0

1.2

0.8

1.4

0.8

1.4

1.2

0.6

The experts supply inputs�xi = +1

for “yes” or -1 for “no”

wi are the weights on inputs xi (“weight” = the number of votes that expert gets)

“Activation” a = Σ wixi

a > 0 means “dog will bite”

“neuron”

22 of 51

Let’s look at

how the neuron computes its answer

22

23 of 51

The simplest possible unit: the linear unit

23

x1

x2

w2

w1

y

x1 × w1 + x2 × w2 = y

Inputs

Weights

Summation

Output

Each input xi is multiplied by the corresponding weight wi. Sum the results.

24 of 51

The simplest possible unit: the linear unit

24

x1

x2

-2

5

4

x1 × w1 + x2 × w2 = y

2 × 5 + 3 × -2 = 4

x1

x2

2

3

1

4

3

-2

-2

-5

varying inputs

fixed weights

Integer or whole number inputs

25 of 51

The simplest possible unit: the linear unit

25

x1

x2

-2

5

-3

x1

x2

2

3

1

4

3

-2

-2

-5

×

×

x1 × w1 + x2 × w2 = y

1 × __ + 4 × ___ = ___

Enter the missing weights and solve the problem

26 of 51

The simplest possible unit: the linear unit

26

x1

x2

-2

5

19

x1

x2

2

3

1

4

3

-2

-2

-5

x1 × w1 + x2 × w2 = y

_ × 5 + ___ × -2 = ___

Enter the missing input values highlighted in blue in the table� and solve the problem

27 of 51

The simplest possible unit: the linear unit

27

x1

x2

-2

5

0

x1

x2

2

3

1

4

3

-2

-2

-5

x1 × w1 + x2 × w2 = y

__ × __ + __ × __ = ___

Enter the missing input values highlighted in blue in the table

Enter the missing weights from the diagram, and �solve the problem

28 of 51

Linear Threshold Unit

28

29 of 51

The linear threshold unit produces

a binary output

29

>0

x1

x2

w2

w1

0 for “no” or 1 for “yes”

a = x1 × w1 + x2 × w2 activation - equation or formula

y =

1 if a > 0

0 if a ≤ 0

Remember the Perceptron….

Binary outputs - 0 or 1

30 of 51

Should I wear boots today?

30

Raining?

Snowing?

Sum

Output: Wear Boots

No (0)

No (0)

0

Yes (1)

No (0)

1

No (0)

Yes (1)

1

Yes (1)

Yes (1)

2

>0

1

raining

snowing

1

wear boots

Binary inputs - the input value can only be either yes (1) or no (0).

Use the values in the table to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 0 to trigger wear boots

Possible Answers:

1 (Yes) or 0 (No)

31 of 51

Can I make a PBJ sandwich?

Must have peanut butter, jelly, and bread.

31

Peanut

butter

Jelly

Bread

Sum

Output: Sandwich

No (0)

No (0)

No (0)

0

0 (No)

Yes (1)

No (0)

No (0)

1

No (0)

Yes (1)

No (0)

Yes (1)

Yes (1)

No (0)

No (0)

No (0)

Yes (1)

Yes (1)

No (0)

Yes (1)

No (0)

Yes (1)

Yes (1)

2

Yes (1)

Yes (1)

Yes (1)

>2

peanut butter

jelly

1

1

can make sandwich

bread

1

Use the values in the table to update the

Sum and output columns.

Note: this unit has a threshold for the sum to be greater than 2 to trigger make a sandwich

32 of 51

Unequal Weights

32

33 of 51

Does John get to eat dessert tonight?

If he ate his vegetables, finished his milk, and cleared the table, then yes.�But if the dessert is healthy (fruit, not cake or ice cream), he can always have it.

33

Healthy Dessert

Ate Vegetables

Finished

Milk

Cleared

Table

Sum

Output

No (0)

No (0)

No (0)

No (0)

0

0

Yes (1)

No (0)

No (0)

No (0)

No (0)

No (0)

Yes (1)

No (0)

No (0)

Yes (1)

No (0)

Yes (1)

No (0)

Yes (1)

Yes (1)

Yes (1)

Yes (1)

Yes (1)

Yes (1)

Yes (1)

>2

ate veggies

finished milk

1

1

gets�dessert

cleared table

1

healthy dessert

5

Use the values in the table to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 2 to trigger gets dessert.

34 of 51

Negative weights

34

35 of 51

Negative weights: can Lucy buy garden supplies?

She can buy supplies if the store is open and she has money, unless her car is out of gas or has a flat tire.

35

Store

open

Have

money

Out of

gas

Flat

tire

Sum

Output

No (0)

No (0)

No (0)

No (0)

Yes (1)

No (0)

No (0)

No (0)

No (0)

Yes (1)

No (0)

Yes (1)

Yes (1)

Yes (1)

No (0)

No (0)

Yes (1)

Yes (1)

Yes (1)

No (0)

>1

has money

out of gas

-5

1

can buy supplies

flat tire

-5

store is open

1

Use the values in the table to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 1 to trigger can buy supplies

36 of 51

Numbers as inputs

36

37 of 51

Integer instead of binary inputs: Can we go on a class trip?

The number of adults must exceed the number of children.�Inputs to the network are integers.

37

# adults

# children

-1

1

can go

>0

# of Adults

# of Children

Sum

Output

1

1

2

3

3

2

5

5

7

3

Use the values in the table and weights in the diagram to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 0 to trigger can go

38 of 51

Exercise: Can I go on this roller coaster?

You can go on the roller coaster if you have a ticket, are tall enough, and are not carrying any drink with you.

38

>?

have ticket

carrying drink

w3

w1

can go�on ride

tall enough

w2

What should the values of w1, w2, w3, and the threshold be?

Hint: Choose numbers positive or negative for the weights and threshold values

39 of 51

Draw a neural network to answer this question:

Exercise: Should we break for lunch?

There are 5 team members (x1 x2, x3, x4, x5)

An Input xi is 1 when team member i is hungry; otherwise it’s 0.

Break when more than half of the team is hungry.

What should the neural network look like?

39

40 of 51

Negative threshold: can I buy milk?

If the supermarket is closed, buy at the bodega.�If the bodega is closed, buy at the supermarket.�If both are closed, cannot buy milk.

40

>?

supermarket closed

-1

-1

can buy milk

x1

x2

Sum

Output

0

0

0

1

1

0

-1

1

0

1

-1

1

1

1

-2

0

What should the threshold be?

bodega closed

41 of 51

Challenge Questions

41

42 of 51

“Mismatch” Function (Exclusive OR)

“Mismatch” outputs a 1 when its two inputs disagree: exactly one of them is a 1.

The technical name is “exclusive OR”, or “XOR”.

Can we compute XOR using a single linear threshold unit?

42

x1

x2

x1 XOR x2

0

0

0

1

0

1

0

1

1

1

1

0

>?

x1

x2

w2

w1

x1 XOR x2

43 of 51

“Mismatch” Function (Exclusive OR)

“Mismatch” outputs a 1 when its two inputs disagree: exactly one of them is a 1.

The technical name is “exclusive OR”, or “XOR”.

Can we compute XOR using a single linear threshold unit?

43

x1

x2

x1 XOR x2

0

0

0

1

0

1

0

1

1

1

1

0

No!

We can prove mathematically that XOR cannot be computed by a single linear threshold unit.

… But we can compute XOR using hidden units.

44 of 51

Mismatch Solution 1

44

x1

x2

-1

1

>0

>0

-1

1

>0

1

1

x1

x2

x1 XOR x2

0

0

0

1

0

1

0

1

1

1

1

0

“x1 and not x2

“x2 and not x1

“(x1 and not x2) or� (x2 and not x1)”

Hidden Units Output Unit

45 of 51

Mismatch Solution 2

45

x1

x2

1

1

>0

>1

1

1

>0

1

-1

x1

x2

x1 XOR x2

0

0

0

1

0

1

0

1

1

1

1

0

“x1 or x2

“x1 and x2

“(x1 or x2) and not� (x1 and x2)”

Hidden Units Output Unit

46 of 51

Pro tip: replace thresholds with bias connections

We can replace the threshold t with a bias connection with weight w0 = -t. This lets us fix the threshold at 0 for all units. Now our units only have one type of parameter (the weight vector) instead of two types, which simplifies things.

46

>t

x1

x2

w3

w1

y

x3

w2

>0

x1

x2

w3

w1

y

x3

w2

1

w0 = -t

bias connection

47 of 51

Module 3.5 Takeaways

47

48 of 51

Learning Algorithms

  • Building neural nets by hand requires careful attention to the weights.
    • Only feasible for small networks.�
  • What we need is a learning algorithm that can set the weights for us.�
  • All we have to do is show it some training data so it knows what output we want to produce for each input.�
  • The “will this dog bite me” network was trained using a very simple learning algorithm...

48

49 of 51

Relationship between Big Ideas 2 and 3

  • The dog bite predicting neural network is doing reasoning.
    • It’s solving a classification problem.�
  • The input representation used by the network is a feature vector.
    • Each expert contributes one feature.
    • The features could be binary (-1/+1) or real values from -1 to +1.�
  • Another important representation is the weight vector.�
  • The learning algorithm adjusts the weight vector so as to improve the performance of the reasoner. Once learning is complete, using the network to classify new dog images is reasoning (Module 3.1).

49

50 of 51

The neural net learning rule

Neural net learning rules describe how weights change with experience.

We can write the learning rule for our dog bite network as an equation that explains how the weight wi of the ith expert changes based on the output xi and the desired output d (from the teacher) for the current training example.

Δwi = 0.1 · xi · d Notation: Δwi (delta wi) means “change in wi

Notice that if xi and d are either both +1 or both -1, the weight wi will increase. If they have opposite signs, the weight will decrease. If xi is zero, wi will not change.��This is a very simple learning rule that doesn’t work for networks with hidden layers. Most neural nets use something more complex, but the form is similar.

50

51 of 51

Backpropagation learning rule: allows for hidden layers

51

Error signal

Mean Square Error (d-y)2