Neural Networks
Unit 3, Module 3.5
1
A lot of the AI applications you’ve used have neural networks inside of them.
This is the dominant technology today for many areas of AI.
2
Only in the last 15 years have neural nets become good enough to have practical uses…. Let’s look at where they started
3
Cue this and add URL
Decision Trees vs. Neural Networks
We learned how to build decision trees. They are useful for some types of problems.
But for really hard problems, neural networks work better. Why?
Decision trees depend on having a good feature set. What if we don’t?
Neural networks can create their own features. This allows them to come up with sophisticated solutions to hard problems, such as understanding speech or images, that decision trees can’t handle.
4
Why are they called “neural” networks?
5
Beware the “hype” around neural networks
6
Anyone who tells you that neural nets “work the way the brain does” is wrong. They have bought the hype.
Don’t buy the hype.
Neural Networks in a Minute
7
URL (time)
What is a neural network?
8
units
weighted connections
a number
a number
a number
a number
a number
Neural Networks Are Organized In Layers
9
Input�Layer
Hidden�Layer
Output�Layer
Deep Neural Networks Have Many Layers
10
Let’s focus on how one neuron learns
11
Learning to predict dog bites
12
1 vote
1 vote
1 vote
1 vote
1 vote
1 vote
1 vote
Start with a bunch of “experts”. This is our input layer.
Some experts are good for predicting dog bites.
But which are the good ones?
Every expert starts out with 1 vote.
Is this dog going to bite me? (1)
13
1 vote
1 vote
1 vote
1 vote
1 vote
1 vote
1 vote
NO
NO
NO
NO
YES
YES
YES
Tally
No Yes
4 votes 3 votes
Is this dog going to bite me? (1)
14
1.1 votes
1.1 votes
1.1 votes
1.1 votes
0.9 votes
0.9 votes
0.9 votes
NO
NO
NO
NO
YES
YES
YES
Doesn’t bite.
Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.
Is this dog going to bite me? (2)
15
1.1 votes
1.1 votes
1.1 votes
1.1 votes
0.9 votes
0.9 votes
0.9 votes
NO
NO
YES
YES
YES
NO
NO
Tally
No Yes
4.0 votes 3.1 votes
Is this dog going to bite me? (2)
16
1.0 votes
1.0 votes
1.2 votes
1.2 votes
1.0 votes
0.8 votes
0.8 votes
NO
NO
YES
YES
YES
NO
NO
Does bite!
Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.
Is this dog going to bite me? (3)
17
1.0 votes
1.0 votes
1.2 votes
1.2 votes
1.0 votes
0.8 votes
0.8 votes
NO
YES
NO
NO
NO
YES
NO
Tally
No Yes
5.2 votes 1.8 votes
Is this dog going to bite me? (3)
18
1.1 votes
0.9 votes
1.3 votes
1.3 votes
1.1 votes
0.7 votes
0.9 votes
NO
YES
NO
NO
NO
YES
NO
Doesn’t bite
Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.
Is this dog going to bite me? (4)
19
1.1 votes
0.9 votes
1.3 votes
1.3 votes
1.1 votes
0.7 votes
0.9 votes
YES
NO
YES
YES
YES
NO
NO
Tally
No Yes
2.5 votes 4.8 votes
Is this dog going to bite me? (4)
20
1.2 votes
0.8 votes
1.4 votes
1.4 votes
1.2 votes
0.6 votes
0.8 votes
YES
NO
YES
YES
YES
NO
NO
Does bite!
Give more votes to the experts who answered correctly. Subtract votes from the ones who got it wrong.
Turning our dog bite predictor into a neural network
21
Σ>0
1.2
0.8
1.4
0.8
1.4
1.2
0.6
The experts supply inputs�xi = +1
for “yes” or -1 for “no”
wi are the weights on inputs xi (“weight” = the number of votes that expert gets)
“Activation” a = Σ wixi
a > 0 means “dog will bite”
“neuron”
Let’s look at
how the neuron computes its answer
22
The simplest possible unit: the linear unit
23
∑
x1
x2
w2
w1
y
x1 × w1 + x2 × w2 = y
Inputs
Weights
Summation
Output
Each input xi is multiplied by the corresponding weight wi. Sum the results.
The simplest possible unit: the linear unit
24
∑
x1
x2
-2
5
4
x1 × w1 + x2 × w2 = y
2 × 5 + 3 × -2 = 4
x1 | x2 |
2 | 3 |
1 | 4 |
3 | -2 |
-2 | -5 |
varying inputs
fixed weights
Integer or whole number inputs
The simplest possible unit: the linear unit
25
∑
x1
x2
-2
5
-3
x1 | x2 |
2 | 3 |
1 | 4 |
3 | -2 |
-2 | -5 |
×
×
x1 × w1 + x2 × w2 = y
1 × __ + 4 × ___ = ___
Enter the missing weights and solve the problem
The simplest possible unit: the linear unit
26
∑
x1
x2
-2
5
19
x1 | x2 |
2 | 3 |
1 | 4 |
3 | -2 |
-2 | -5 |
x1 × w1 + x2 × w2 = y
_ × 5 + ___ × -2 = ___
Enter the missing input values highlighted in blue in the table� and solve the problem
The simplest possible unit: the linear unit
27
∑
x1
x2
-2
5
0
x1 | x2 |
2 | 3 |
1 | 4 |
3 | -2 |
-2 | -5 |
x1 × w1 + x2 × w2 = y
�__ × __ + __ × __ = ___
Enter the missing input values highlighted in blue in the table
Enter the missing weights from the diagram, and �solve the problem
Linear Threshold Unit
28
The linear threshold unit produces
a binary output
29
∑>0
x1
x2
w2
w1
0 for “no” or 1 for “yes”
a = x1 × w1 + x2 × w2 activation - equation or formula
y =
1 if a > 0
0 if a ≤ 0
Remember the Perceptron….
Binary outputs - 0 or 1
Should I wear boots today?
30
Raining? | Snowing? | Sum | Output: Wear Boots |
No (0) | No (0) | 0 | |
Yes (1) | No (0) | 1 | |
No (0) | Yes (1) | 1 | |
Yes (1) | Yes (1) | 2 | |
∑>0
1
raining
snowing
1
wear boots
Binary inputs - the input value can only be either yes (1) or no (0).
Use the values in the table to update the sum and output columns.
Note: this unit has a threshold for the sum to be greater than 0 to trigger wear boots
Possible Answers:
1 (Yes) or 0 (No)
Can I make a PBJ sandwich?
Must have peanut butter, jelly, and bread.
31
Peanut butter | Jelly | Bread | Sum | Output: Sandwich |
No (0) | No (0) | No (0) | 0 | 0 (No) |
Yes (1) | No (0) | No (0) | 1 | |
No (0) | Yes (1) | No (0) | | |
Yes (1) | Yes (1) | No (0) | | |
No (0) | No (0) | Yes (1) | | |
Yes (1) | No (0) | Yes (1) | | |
No (0) | Yes (1) | Yes (1) | 2 | |
Yes (1) | Yes (1) | Yes (1) | | |
∑>2
peanut butter
jelly
1
1
can make sandwich
bread
1
Use the values in the table to update the
Sum and output columns.
Note: this unit has a threshold for the sum to be greater than 2 to trigger make a sandwich
Unequal Weights
32
Does John get to eat dessert tonight?
If he ate his vegetables, finished his milk, and cleared the table, then yes.�But if the dessert is healthy (fruit, not cake or ice cream), he can always have it.
33
Healthy Dessert | Ate Vegetables | Finished Milk | Cleared Table | Sum | Output |
No (0) | No (0) | No (0) | No (0) | 0 | 0 |
Yes (1) | No (0) | No (0) | No (0) | | |
No (0) | No (0) | Yes (1) | No (0) | | |
No (0) | Yes (1) | No (0) | Yes (1) | | |
No (0) | Yes (1) | Yes (1) | Yes (1) | | |
Yes (1) | Yes (1) | Yes (1) | Yes (1) | | |
∑>2
ate veggies
finished milk
1
1
gets�dessert
cleared table
1
healthy dessert
5
Use the values in the table to update the sum and output columns.
Note: this unit has a threshold for the sum to be greater than 2 to trigger gets dessert.
Negative weights
34
Negative weights: can Lucy buy garden supplies?
She can buy supplies if the store is open and she has money, unless her car is out of gas or has a flat tire.
35
Store open | Have money | Out of gas | Flat tire | Sum | Output |
No (0) | No (0) | No (0) | No (0) | | |
Yes (1) | No (0) | No (0) | No (0) | | |
No (0) | Yes (1) | No (0) | Yes (1) | | |
Yes (1) | Yes (1) | No (0) | No (0) | | |
Yes (1) | Yes (1) | Yes (1) | No (0) | | |
∑>1
has money
out of gas
-5
1
can buy supplies
flat tire
-5
store is open
1
Use the values in the table to update the sum and output columns.
Note: this unit has a threshold for the sum to be greater than 1 to trigger can buy supplies
Numbers as inputs
36
Integer instead of binary inputs: Can we go on a class trip?
The number of adults must exceed the number of children.�Inputs to the network are integers.
37
# adults
# children
-1
1
can go
∑>0
# of Adults | # of Children | Sum | Output |
1 | 1 | | |
2 | 3 | | |
3 | 2 | | |
5 | 5 | | |
7 | 3 | | |
Use the values in the table and weights in the diagram to update the sum and output columns.
Note: this unit has a threshold for the sum to be greater than 0 to trigger can go
Exercise: Can I go on this roller coaster?
You can go on the roller coaster if you have a ticket, are tall enough, and are not carrying any drink with you.
38
∑>?
have ticket
carrying drink
w3
w1
can go�on ride
tall enough
w2
What should the values of w1, w2, w3, and the threshold be?
Hint: Choose numbers positive or negative for the weights and threshold values
Draw a neural network to answer this question:
Exercise: Should we break for lunch?
There are 5 team members (x1 x2, x3, x4, x5)
An Input xi is 1 when team member i is hungry; otherwise it’s 0.
Break when more than half of the team is hungry.
What should the neural network look like?
39
Negative threshold: can I buy milk?
If the supermarket is closed, buy at the bodega.�If the bodega is closed, buy at the supermarket.�If both are closed, cannot buy milk.
40
∑>?
supermarket closed
-1
-1
can buy milk
x1 | x2 | Sum | Output |
0 | 0 | 0 | 1 |
1 | 0 | -1 | 1 |
0 | 1 | -1 | 1 |
1 | 1 | -2 | 0 |
What should the threshold be?
bodega closed
Challenge Questions
41
“Mismatch” Function (Exclusive OR)
“Mismatch” outputs a 1 when its two inputs disagree: exactly one of them is a 1.
The technical name is “exclusive OR”, or “XOR”.
Can we compute XOR using a single linear threshold unit?
42
x1 | x2 | x1 XOR x2 |
0 | 0 | 0 |
1 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 0 |
∑>?
x1
x2
w2
w1
x1 XOR x2
“Mismatch” Function (Exclusive OR)
“Mismatch” outputs a 1 when its two inputs disagree: exactly one of them is a 1.
The technical name is “exclusive OR”, or “XOR”.
Can we compute XOR using a single linear threshold unit?
43
x1 | x2 | x1 XOR x2 |
0 | 0 | 0 |
1 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 0 |
No!
We can prove mathematically that XOR cannot be computed by a single linear threshold unit.
… But we can compute XOR using hidden units.
Mismatch Solution 1
44
x1
x2
-1
1
∑>0
∑>0
-1
1
∑>0
1
1
x1 | x2 | x1 XOR x2 |
0 | 0 | 0 |
1 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 0 |
“x1 and not x2”
“x2 and not x1”
“(x1 and not x2) or� (x2 and not x1)”
Hidden Units Output Unit
Mismatch Solution 2
45
x1
x2
1
1
∑>0
∑>1
1
1
∑>0
1
-1
x1 | x2 | x1 XOR x2 |
0 | 0 | 0 |
1 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 0 |
“x1 or x2”
“x1 and x2”
“(x1 or x2) and not� (x1 and x2)”
Hidden Units Output Unit
Pro tip: replace thresholds with bias connections
We can replace the threshold t with a bias connection with weight w0 = -t. This lets us fix the threshold at 0 for all units. Now our units only have one type of parameter (the weight vector) instead of two types, which simplifies things.
46
∑>t
x1
x2
w3
w1
y
x3
w2
∑>0
x1
x2
w3
w1
y
x3
w2
1
w0 = -t
bias connection
Module 3.5 Takeaways
47
Learning Algorithms
48
Relationship between Big Ideas 2 and 3
49
The neural net learning rule
Neural net learning rules describe how weights change with experience.
We can write the learning rule for our dog bite network as an equation that explains how the weight wi of the ith expert changes based on the output xi and the desired output d (from the teacher) for the current training example.
Δwi = 0.1 · xi · d Notation: Δwi (delta wi) means “change in wi”
Notice that if xi and d are either both +1 or both -1, the weight wi will increase. If they have opposite signs, the weight will decrease. If xi is zero, wi will not change.��This is a very simple learning rule that doesn’t work for networks with hidden layers. Most neural nets use something more complex, but the form is similar.
50
Backpropagation learning rule: allows for hidden layers
51
Error signal
Mean Square Error (d-y)2