1 of 51

Neural Networks

Unit 3, Module 3.5

1

2 of 51

A lot of the AI applications you’ve used have neural networks inside of them.

This is the dominant technology today for many areas of AI.

2

3 of 51

Only in the last 15 years have neural nets become good enough to have practical uses…. Let’s look at where they started

3

Cue this and add URL

4 of 51

Decision Trees vs. Neural Networks

We learned how to build decision trees. They are useful for some types of problems.

Decision trees require less data to train.
Decision trees can “explain” their conclusions in terms of features we understand.

But for really hard problems, neural networks work better. Why?

Decision trees depend on having a good feature set. What if we don’t?

Neural networks can create their own features. This allows them to come up with sophisticated solutions to hard problems, such as understanding speech or images, that decision trees can’t handle.

4

5 of 51

Why are they called “neural” networks?

The human brain contains roughly 86 billion neurons wired together in intricate patterns. We can draw an analogy with deep neural networks.�
But biological neurons are more complex than linear threshold units, and real brain wiring is more complex than artificial neural networks.

5

6 of 51

Beware the “hype” around neural networks

Historically, neural networks were inspired by by ideas about how neurons in the brain might work.
But we don’t actually know how our brains reason, learn, imagine, etc. So…

6

Anyone who tells you that neural nets “work the way the brain does” is wrong. They have bought the hype.

Don’t buy the hype.

7 of 51

Neural Networks in a Minute

7

URL (time)

8 of 51

What is a neural network?

A large, complex mathematical function that maps inputs to outputs.

Because the function is so complex, the network can do sophisticated things.�

Each layer is composed of many simple functions, called “units” or “neurons”.��
Each unit takes multiple numbers as input and produces a number as output.

8

units

weighted connections

a number

9 of 51

Neural Networks Are Organized In Layers

9

Input�Layer

Hidden�Layer

Output�Layer

10 of 51

Deep Neural Networks Have Many Layers

10

11 of 51

Let’s focus on how one neuron learns

11

12 of 51

Learning to predict dog bites

12

1 vote

Start with a bunch of “experts”. This is our input layer.

Some experts are good for predicting dog bites.

But which are the good ones?

Every expert starts out with 1 vote.

13 of 51

Is this dog going to bite me? (1)

13

1 vote

NO

YES

Tally

No Yes

4 votes 3 votes

Possible Activity for more participatory

Students get a deck of cards with the dog pictures and what the experts will say

Then they get to update their number of votes on their sheet.

Class a a whole completes the tally.

Option1 - Classroom Context: Each person gets to have a card for an expert.

Need Prompting cards

This is what the cards will look like.

Option 2 - Game - Neural Networks invent their own experts..If you have enough experts…you can solve your problem.

Pick a domain where they don’t have personal expertise (intutions) and have to rely on what the experts are saying. But we don’t

Which of these bacteria causes spoiled milk?

Pictures of bacteria

There are some features that cause spoiled milk

You and your friends are building a detector that will do the right things

If your team is successful, you will be able to predict the label successfully …even though you can’t eyeball the input.

At the end we can reveal what the experts are paying attention to.

Level II - no expert that is al

Learning Objective

Using a procedure for

14 of 51

Is this dog going to bite me? (1)

14

1.1 votes

0.9 votes

NO

YES

Doesn’t bite.

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

15 of 51

Is this dog going to bite me? (2)

15

1.1 votes

0.9 votes

NO

YES

NO

Tally

No Yes

4.0 votes 3.1 votes

16 of 51

Is this dog going to bite me? (2)

16

1.0 votes

1.2 votes

1.0 votes

0.8 votes

NO

YES

NO

Does bite!

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

17 of 51

Is this dog going to bite me? (3)

17

1.0 votes

1.2 votes

1.0 votes

0.8 votes

NO

YES

NO

YES

NO

Tally

No Yes

5.2 votes 1.8 votes

18 of 51

Is this dog going to bite me? (3)

18

1.1 votes

0.9 votes

1.3 votes

1.1 votes

0.7 votes

0.9 votes

NO

YES

NO

YES

NO

Doesn’t bite

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

19 of 51

Is this dog going to bite me? (4)

19

1.1 votes

0.9 votes

1.3 votes

1.1 votes

0.7 votes

0.9 votes

YES

NO

YES

NO

Tally

No Yes

2.5 votes 4.8 votes

20 of 51

Is this dog going to bite me? (4)

20

1.2 votes

0.8 votes

1.4 votes

1.2 votes

0.6 votes

0.8 votes

YES

NO

YES

NO

Does bite!

Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.

21 of 51

Turning our dog bite predictor into a neural network

21

Σ^>⁰

1.2

0.8

1.4

0.8

1.4

1.2

0.6

The experts supply inputs�x_i = +1

for “yes” or -1 for “no”

w_i are the weights on inputs x_i (“weight” = the number of votes that expert gets)

“Activation” a = Σ w_ix_i

a > 0 means “dog will bite”

“neuron”

22 of 51

Let’s look at

how the neuron computes its answer

22

23 of 51

The simplest possible unit: the linear unit

23

∑

x₁

x₂

w₂

w₁

y

x₁ × w₁ + x₂ × w₂ = y

Inputs

Weights

Summation

Output

Each input x_i is multiplied by the corresponding weight w_i. Sum the results.

24 of 51

The simplest possible unit: the linear unit

24

∑

x₁

x₂

-2

5

4

x₁ × w₁ + x₂ × w₂ = y

2 × 5 + 3 × -2 = 4

x₁	x₂
2	3
1	4
3	-2
-2	-5

varying inputs

fixed weights

Integer or whole number inputs

25 of 51

The simplest possible unit: the linear unit

25

∑

x₁

x₂

-2

5

-3

x₁	x₂
2	3
1	4
3	-2
-2	-5

×

x₁ × w₁ + x₂ × w₂ = y

1 × __ + 4 × ___ = ___

Enter the missing weights and solve the problem

26 of 51

The simplest possible unit: the linear unit

26

∑

x₁

x₂

-2

5

19

x₁	x₂
2	3
1	4
3	-2
-2	-5

x₁ × w₁ + x₂ × w₂ = y

_ × 5 + ___ × -2 = ___

Enter the missing input values highlighted in blue in the table� and solve the problem

27 of 51

The simplest possible unit: the linear unit

27

∑

x₁

x₂

-2

5

0

x₁	x₂
2	3
1	4
3	-2
-2	-5

x₁ × w₁ + x₂ × w₂ = y

�__ × __ + __ × __ = ___

Enter the missing input values highlighted in blue in the table

Enter the missing weights from the diagram, and �solve the problem

28 of 51

Linear Threshold Unit

28

29 of 51

The linear threshold unit produces

a binary output

29

∑>0

x₁

x₂

w₂

w₁

0 for “no” or 1 for “yes”

a = x₁ × w₁ + x₂ × w₂ activation - equation or formula

y =

1 if a > 0

0 if a ≤ 0

Remember the Perceptron….

Binary outputs - 0 or 1

30 of 51

Should I wear boots today?

30

Raining?	Snowing?	Sum	Output: Wear Boots
No (0)	No (0)	0
Yes (1)	No (0)	1
No (0)	Yes (1)	1
Yes (1)	Yes (1)	2

∑>0

1

raining

snowing

1

wear boots

Binary inputs - the input value can only be either yes (1) or no (0).

Use the values in the table to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 0 to trigger wear boots

Possible Answers:

1 (Yes) or 0 (No)

31 of 51

Can I make a PBJ sandwich?

Must have peanut butter, jelly, and bread.

31

Peanut butter	Jelly	Bread	Sum	Output: Sandwich
No (0)	No (0)	No (0)	0	0 (No)
Yes (1)	No (0)	No (0)	1
No (0)	Yes (1)	No (0)
Yes (1)	Yes (1)	No (0)
No (0)	No (0)	Yes (1)
Yes (1)	No (0)	Yes (1)
No (0)	Yes (1)	Yes (1)	2
Yes (1)	Yes (1)	Yes (1)

∑>2

peanut butter

jelly

1

can make sandwich

bread

1

Use the values in the table to update the

Sum and output columns.

Note: this unit has a threshold for the sum to be greater than 2 to trigger make a sandwich

32 of 51

Unequal Weights

32

33 of 51

Does John get to eat dessert tonight?

If he ate his vegetables, finished his milk, and cleared the table, then yes.�But if the dessert is healthy (fruit, not cake or ice cream), he can always have it.

33

Healthy Dessert	Ate Vegetables	Finished Milk	Cleared Table	Sum	Output
No (0)	No (0)	No (0)	No (0)	0	0
Yes (1)	No (0)	No (0)	No (0)
No (0)	No (0)	Yes (1)	No (0)
No (0)	Yes (1)	No (0)	Yes (1)
No (0)	Yes (1)	Yes (1)	Yes (1)
Yes (1)	Yes (1)	Yes (1)	Yes (1)

∑>2

ate veggies

finished milk

1

gets�dessert

cleared table

1

healthy dessert

5

Use the values in the table to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 2 to trigger gets dessert.

34 of 51

Negative weights

34

35 of 51

Negative weights: can Lucy buy garden supplies?

She can buy supplies if the store is open and she has money, unless her car is out of gas or has a flat tire.

35

Store open	Have money	Out of gas	Flat tire	Sum	Output
No (0)	No (0)	No (0)	No (0)
Yes (1)	No (0)	No (0)	No (0)
No (0)	Yes (1)	No (0)	Yes (1)
Yes (1)	Yes (1)	No (0)	No (0)
Yes (1)	Yes (1)	Yes (1)	No (0)

∑>1

has money

out of gas

-5

1

can buy supplies

flat tire

-5

store is open

1

Use the values in the table to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 1 to trigger can buy supplies

36 of 51

Numbers as inputs

36

37 of 51

Integer instead of binary inputs: Can we go on a class trip?

The number of adults must exceed the number of children.�Inputs to the network are integers.

37

# adults

# children

-1

1

can go

∑>0

# of Adults	# of Children	Sum	Output
1	1
2	3
3	2
5	5
7	3

Use the values in the table and weights in the diagram to update the sum and output columns.

Note: this unit has a threshold for the sum to be greater than 0 to trigger can go

38 of 51

Exercise: Can I go on this roller coaster?

You can go on the roller coaster if you have a ticket, are tall enough, and are not carrying any drink with you.

38

∑>?

have ticket

carrying drink

w₃

w₁

can go�on ride

tall enough

w₂

What should the values of w₁, w₂, w₃, and the threshold be?

Hint: Choose numbers positive or negative for the weights and threshold values

39 of 51

Draw a neural network to answer this question:

Exercise: Should we break for lunch?

There are 5 team members (x₁ x₂, x₃, x₄, x₅)

An Input x_i is 1 when team member i is hungry; otherwise it’s 0.

Break when more than half of the team is hungry.

What should the neural network look like?

39

40 of 51

Negative threshold: can I buy milk?

If the supermarket is closed, buy at the bodega.�If the bodega is closed, buy at the supermarket.�If both are closed, cannot buy milk.

40

∑>?

supermarket closed

-1

can buy milk

x₁	x₂	Sum	Output
0	0	0	1
1	0	-1	1
0	1	-1	1
1	1	-2	0

What should the threshold be?

bodega closed

41 of 51

Challenge Questions

41

42 of 51

“Mismatch” Function (Exclusive OR)

“Mismatch” outputs a 1 when its two inputs disagree: exactly one of them is a 1.

The technical name is “exclusive OR”, or “XOR”.

Can we compute XOR using a single linear threshold unit?

42

x₁	x₂	x₁ XOR x₂
0	0	0
1	0	1
0	1	1
1	1	0

∑>?

x₁

x₂

w₂

w₁

x₁ XOR x₂

43 of 51

“Mismatch” Function (Exclusive OR)

“Mismatch” outputs a 1 when its two inputs disagree: exactly one of them is a 1.

The technical name is “exclusive OR”, or “XOR”.

Can we compute XOR using a single linear threshold unit?

43

x₁	x₂	x₁ XOR x₂
0	0	0
1	0	1
0	1	1
1	1	0

No!

We can prove mathematically that XOR cannot be computed by a single linear threshold unit.

… But we can compute XOR using hidden units.

44 of 51

Mismatch Solution 1

44

x₁

x₂

-1

1

∑>0

-1

1

∑>0

1

x₁	x₂	x₁ XOR x₂
0	0	0
1	0	1
0	1	1
1	1	0

“x₁ and not x₂”

“x₂ and not x₁”

“(x₁ and not x₂) or� (x₂ and not x₁)”

Hidden Units Output Unit

45 of 51

Mismatch Solution 2

45

x₁

x₂

1

∑>0

∑>1

1

∑>0

1

-1

x₁	x₂	x₁ XOR x₂
0	0	0
1	0	1
0	1	1
1	1	0

“x₁ or x₂”

“x₁ and x₂”

“(x₁ or x₂) and not� (x₁ and x₂)”

Hidden Units Output Unit

46 of 51

Pro tip: replace thresholds with bias connections

We can replace the threshold t with a bias connection with weight w₀ = -t. This lets us fix the threshold at 0 for all units. Now our units only have one type of parameter (the weight vector) instead of two types, which simplifies things.

46

∑>t

x₁

x₂

w₃

w₁

y

x₃

w₂

∑>0

x₁

x₂

w₃

w₁

y

x₃

w₂

1

w₀ = -t

bias connection

47 of 51

Module 3.5 Takeaways

47

48 of 51

Learning Algorithms

Building neural nets by hand requires careful attention to the weights.

Only feasible for small networks.�

What we need is a learning algorithm that can set the weights for us.�
All we have to do is show it some training data so it knows what output we want to produce for each input.�
The “will this dog bite me” network was trained using a very simple learning algorithm...

48

49 of 51

Relationship between Big Ideas 2 and 3

The dog bite predicting neural network is doing reasoning.

It’s solving a classification problem.�

The input representation used by the network is a feature vector.

Each expert contributes one feature.
The features could be binary (-1/+1) or real values from -1 to +1.�

Another important representation is the weight vector.�
The learning algorithm adjusts the weight vector so as to improve the performance of the reasoner. Once learning is complete, using the network to classify new dog images is reasoning (Module 3.1).

49

50 of 51

The neural net learning rule

Neural net learning rules describe how weights change with experience.

We can write the learning rule for our dog bite network as an equation that explains how the weight w_i of the ith expert changes based on the output x_i and the desired output d (from the teacher) for the current training example.

Δw_i = 0.1 · x_i · d Notation: Δw_i (delta w_i) means “change in w_i”

Notice that if x_i and d are either both +1 or both -1, the weight w_i will increase. If they have opposite signs, the weight will decrease. If x_i is zero, w_i will not change.��This is a very simple learning rule that doesn’t work for networks with hidden layers. Most neural nets use something more complex, but the form is similar.

50

51 of 51

Backpropagation learning rule: allows for hidden layers

51

Error signal

Mean Square Error (d-y)²