1 of 83

San Francisco AI

SF AI & Emerging Tech

SF East Bay AI and Emerging Tech (Berkeley)

Organizers:

Mia Dand

Manas Mudbari

Federico Gobbi

Rae

2 of 83

Research & Strategic advisory firm
Innovation at Scale w/ AI & Emerging Tech
Responsible & Ethical AI (governance/policy, best practices, COEs)
Diversity & Inclusion

100 Women in AI Ethics

https://tinyurl.com/WomenInAIEthicsFORM

Predictions for 2019

https://tinyurl.com/womeninAIEthicsVoices

3 of 83

AI/Machine Learning 101: Demystifying DNNs

Rosie Campbell, Assistant Director of the Center for Human-Compatible AI (CHAI) at UC Berkeley. Rosie previously worked as a research engineer at the BBC and co-founded a thriving futurist group in the UK. She is an aspiring rationalist, effective altruist, and is a pioneer in exploring how emerging technologies can shape a positive future.

4 of 83

Thank you!

More information:

Twitter: @RosieCampbell Email: rosiecampbell@berkeley.edu

PSA: Please take your empty plates/glasses with you :)

**Take a few minutes to rate this meetup**

5 of 83

Demystifying Deep �Neural Networks

@RosieCampbell

6 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

7 of 83

INTRODUCTION

Context:

The aim is to give you an intuition of how neural nets work
Foundational, ‘101’ level
I wrote this in 2016, a lot has moved on!

@RosieCampbell

8 of 83

WHO AM I?

BSc Physics (& Philosophy)

MSc Computer Science

Research Engineer at BBC R&D

Machine Learning PhD (Dropout)

Founder of Manchester Futurists

Assistant Director of the Center for Human-Compatible AI at UC Berkeley

@RosieCampbell

9 of 83

HISTORY

1940s - 1960s �Cybernetics

1980s - 1990s �Connectionism

~2006 - now �Deep Learning

@RosieCampbell

10 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

11 of 83

RECOGNIZING A CAT

Conventional Computing:�

IF (furry) AND

IF (has tail) AND

IF (has 4 legs) AND

IF (has pointy ears) AND

Etc…

@RosieCampbell

12 of 83

RECOGNIZING A CAT

Neural Networks:

Cats

Not cats

@RosieCampbell

13 of 83

CONVENTIONAL COMPUTING �VS NEURAL NETS

Source: kevinbinz.com

@RosieCampbell

14 of 83

A RULE OF THUMB

Good at things that �humans are good at �(e.g. pattern matching)

Bad at things that �computers are good at �(e.g. maths)

Neural Networks are:

@RosieCampbell

15 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

16 of 83

A TRIVIAL EXAMPLE

Shall I go to the festival?

Will the weather be nice?

What’s the music like?

Do I have anyone to go with?

Can I afford it?

Do I need to write my thesis?

Will I like the food?

@RosieCampbell

17 of 83

A SINGLE NEURON

Weather

Music

Company

Money

Importance

Is

total �over a certain�Threshold�?

Answer

@RosieCampbell

18 of 83

A SINGLE NEURON

Score �(out of 4)

Importance

(out of 4)

3

2

4

2

12

4

8

2

> 25?

Yes!

Weather

Music

Company

Money

4

2

1

26

@RosieCampbell

If I were to try and put some numbers on this, it might look something like this.

Let’s say the weather is looking pretty good but not perfect, so it gets a 3 out of 4. The music is ok but not my favourite, so it gets a 2 out of 4. Maybe my best friend has said she’ll come with me so I know the company will be great, so it gets full marks out of 4. Money-wise, maybe it’s a little pricey but not completely unreasonable so it gets a 2 out of 4.

Now for the importance: Maybe the weather is very important to me — I really want to go if it’s sunny, and I really don’t want to go if it’s not, so I give it a full 4 out of 4. Music-wise, let’s say I’m happy to dance to most things, so it’s not super important to me and I’ll give it a 2 out of 4. Similarly, I wouldn’t mind too much going to the festival on my own, so company can have a 2 out of 4 for importance. Finally, let’s say I’m feeling particularly flush at the moment so I’m not too worried about money, so I can give it just a 1 out of 4.

I multiply the value of each input by its weight, total this up, then check if it’s over a certain threshold — I’ve arbitrarily chosen 25 in this case. With these values, I get a total of 26, which is greater than 25, so I’m going to the festival!

19 of 83

A SINGLE NEURON

Move this to the other side of the equation

Score �(out of 4)

Importance

(out of 4)

3

2

4

2

12

4

8

2

> 25?

Yes!

Weather

Music

Company

Money

4

2

1

26

@RosieCampbell

20 of 83

A SINGLE NEURON

Score �(out of 4)

Importance

(out of 4)

3

2

4

2

12

4

8

2

> 0?

Yes!

Weather

Music

Company

Money

4

2

1

-25

@RosieCampbell

21 of 83

A SINGLE NEURON

Input

Weight

Output

Activation�function

Bias

@RosieCampbell

22 of 83

A SINGLE NEURON

Bias

Input

Weight

Output

Activation�function

@RosieCampbell

23 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

24 of 83

A NEURAL NETWORK

Input�layer

Hidden

layer

Output

layer

Each of these blobs is a neuron

@RosieCampbell

25 of 83

A DEEP NEURAL NETWORK

Input

layer

Hidden

layer

Output �layer

Hidden

layer

Each of these blobs is a neuron

@RosieCampbell

26 of 83

ACTIVATION FUNCTIONS

Step

Sigmoid

ReLU

What we’ve used so far

Popular historically

Popular these days

@RosieCampbell

In our trivial festival example, we only had a binary output; yes or no — which we could achieve with a simple threshold test. But what if you want an output that isn’t binary? For this, we can use an activation function.

Here, we take the result of the sum part of the neuron and find it on the x-axis. We then read up along the y-axis to find the new output value. You can use any function you like, but there are a few that are commonly used.

The step function is what we have already seen: Below 0, our output is 0, above 0, our output is 1. The sigmoid function was popular historically as a kind of ‘smoothed out’ version of the step function which can give continuous values. It still gives a 0 at very low values and a 1 at very high values, but continuous values in between. The Rectified Linear Unit (ReLU) function outputs a 0 at values below 0, and outputs the input value at values above 0.

27 of 83

A SINGLE NEURON

Input

Weight

Output

Activation�function

Bias

Those graphs go here

@RosieCampbell

28 of 83

Isn’t it all just simple arithmetic then?!

@RosieCampbell

29 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

30 of 83

TRAINING THE NETWORK

Randomly initialise the network weights and biases
Get a ton of labelled training data
For every piece of training data
Check whether the network gets it right
If not, how wrong was it?
Nudge the weights a little to increase the probability �of the network getting the answer right
Repeat

(The short version)

@RosieCampbell

31 of 83

TRAINING THE NETWORK

Input

layer

Hidden

layer

Output �layer

Hidden

layer

62% Dog

38% Cat

It should be 100% Cat :(

(The short version)

@RosieCampbell

32 of 83

TRAINING THE NETWORK

Input

layer

Hidden

layer

Output �layer

Hidden

layer

62% Dog

38% Cat

It should be 100% Cat :(

Go backwards and nudge weights to increase Cat probability

(The short version)

@RosieCampbell

33 of 83

(The long version)

How do we know how wrong the network is?

We measure the difference between the network’s output and the correct output using the ‘Loss Function’

TRAINING THE NETWORK

@RosieCampbell

34 of 83

THE LOSS FUNCTION

Sometimes called the ‘error’, ‘energy’ or ‘cost’ function�

A simple example is �‘mean squared error’

@RosieCampbell

How do we know how wrong the network is? We measure the difference between the network’s output and the correct output using the ‘loss function’.

The loss function is also sometimes called the error function, energy function or cost function. Again, this took me a while to realise, so hopefully that will save you some time!

The best loss function will depend on your data and the intended application. We don’t have time to go into details about the different options, but in the spirit of encouraging intuition, a simple example of a loss function could be ‘mean squared error’ — this is what we learnt at school for fitting a line to data points — you try to minimise the squared distance between each of the points and the line.

A loss function is doing a similar thing, but in many, many more dimensions.

35 of 83

THE LOSS FUNCTION

Just think of it as:

The difference between what the network should output and what it does output

The goal of training is to find values for the weights and biases that minimize the loss function

@RosieCampbell

36 of 83

MINIMIZING THE LOSS FUNCTION

Source: firsttimeprogrammer.blogspot.co.uk

Loss

Weights

Starting here

We want to get to here

Weights

@RosieCampbell

We can plot the loss against the weights. To do this accurately, we would need to be able to visualise tons of dimensions, to account for the many weights and biases in the network.

Because I find it difficult to visualise more than three dimensions, let’s pretend we only need to find two weight values. We can then use the third dimension for the loss.

Before training the network, the weights and biases are randomly initialised so the loss function is likely to be high as the network will get a lot of things wrong. Our aim is to find the lowest point of the loss function, and then see what weight values that corresponds to. It might look something like this.

Here, we can easily see where the lowest point is and could happily read off the corresponding weights values. Unfortunately, it’s not that easy in reality. The network doesn’t have a nice overview of the loss function, it can only know what its current loss is and its current weights and biases are.

37 of 83

MINIMIZING THE LOSS FUNCTION

Starting here

Where is the lowest point?!

@RosieCampbell

38 of 83

GRADIENT DESCENT

Find the direction of the steepest slope downwards, �and take a small step by nudging the weights

@RosieCampbell

39 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

40 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

41 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

42 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

43 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

44 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

45 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

46 of 83

GRADIENT DESCENT

Starting here

Cat

Dog

Network predictions

@RosieCampbell

47 of 83

GRADIENT DESCENT

Starting here

Can’t get any lower. �We made it!

Cat

Dog

Network predictions

@RosieCampbell

48 of 83

Back Propagation

We start at the output layer and work backwards throughout the network
We calculate the slope at each layer using differentiation
We then nudge all the weights a little in the direction we calculated
The amount we nudge is determined by the learning rate

@RosieCampbell

49 of 83

Phew!

@RosieCampbell

50 of 83

IN PRACTICE...

TensorFlow takes care of (most of) it! 🎉

Google’s open source Python Machine Learning library
Inbuilt functions to deal with the tricky maths!
Good documentation and lots of tutorials

(Other libraries are available...)

@RosieCampbell

51 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

52 of 83

CONVOLUTIONAL NEURAL NETWORKS

ConvNets take advantage of structure in input data

Images have a 2D structure

So ConvNets are great for image processing �and computer vision tasks

@RosieCampbell

Now let’s look at a special type of neural network called a ‘Convolutional Neural Network’, or ConvNet. Earlier, we visualised a cat image being fed into a neural network, and I said just assume each pixel corresponds to one input. It turns out, there’s a more effective way to handle image data rather than assuming each pixel is independent.

If you think about an image, normally, pixels have a relationship with their neighbours (unless the image is of random noise). If I pick any random pixel of an image, it is quite likely that at least some of its neighbours are close to it in colour. In other words, there is a kind of structure to the image where neighbouring pixels tend to have some correlation.

ConvNets are specifically designed to take advantage of structure in input data. This is why they work so well for image processing and computer vision tasks.

53 of 83

IMAGES ARE 2D ARRAYS OF NUMBERS

0	0	0
0.8	0	0
1	0	0

@RosieCampbell

54 of 83

CONVOLUTION

1	0	1
0	1	0
1	0	1

Pass an array of numbers

(known as a ‘kernel filter’)

Over every pixel

Multiply and add together to get new value at that pixel

Source: ufldl.stanford.edu

@RosieCampbell

55 of 83

EXAMPLES

Images by Michael Plotke (Wikimedia Commons)

1	1	1
1	1	1
1	1	1

0	-1	0
-1	5	-1
0	-1	0

-1	-1	-1
-1	8	-1
-1	-1	-1

Original:

Blur:

1

9

Sharpen:

Edge detect:

@RosieCampbell

56 of 83

A NEURAL NET

Input

Weight

Output

Activation�function

Bias

Add them all up

Put through the activation function

Output result

Take some inputs

Multiply by weights

@RosieCampbell

57 of 83

A CONVNET

Output

array

Activation�function

W	W	W
W	W	W
W	W	W

Shared kernel filter

Input

array

Input

array

Input

array

Input

array

Convolution operator

Bias

Take some inputs

Multiply by weights

Add them all up

Put through the activation function

Output result

@RosieCampbell

58 of 83

IMAGENET CHALLENGE

Source: kaggle.com

@RosieCampbell

59 of 83

IMAGENET CHALLENGE

Deep ConvNet introduced

Percentage error

Until 2011, a good result was 25% error.

In 2012, ConvNets were introduced and the error plummeted to just 15.3%!

@RosieCampbell

60 of 83

IMAGENET CHALLENGE

Deep ConvNet introduced

Percentage error

@RosieCampbell

61 of 83

INSIDE A CONVNET

Early layers

Mid layers

Later layers

Source: xlgps.com

@RosieCampbell

Remember we talked about how images are just arrays of numbers? And how kernel filters are also just arrays of numbers? This means we can actually visualise kernel filters as images. We can therefore see what weights have been learnt at different layers in the network, which helps us understand exactly how the ConvNet is learning to recognise objects.

Here, we can see that in the early layers, the network is learning very basic patterns: lines, edges and colour patches — the fundamental building blocks of all objects. In the mid layers, the network has began to put these together into more recognisable structures: we can identify corners and curves. Finally, in the later layers, we start to see things that look much more recognisable as parts of objects, such as tyres and boxes.

Apparently, this is not dissimilar to how our own human visual system perceives objects; another example of how neural networks imitate biology.

62 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

63 of 83

SEPARATING STYLE FROM CONTENT

Source: Gatys et al, ‘A Neural Algorithm of Artistic Style’, 2015

@RosieCampbell

A couple of years ago, Gatys et al noticed a very interesting consequence of ConvNets trained for object recognition. They realised that to identify an object, the network had to learn to abstract away the style of the image. The network should be able to recognise a cat whether it is a photo or a drawing, for example.

They found that the network was essentially siphoning off this stylistic information into certain layers of the network, so that it could be easily ignored. Similarly, certain layers were focused only on the content of the image, which were the layers primarily used for identifying the object.

This meant they could use the ‘style layers’ from one image and merge them with the ‘content layers’ of another image, resulting in an image taking on another image’s stylistic properties. They found they could the transfer the style of classic works of arts onto everyday photos:

64 of 83

SEPARATING STYLE FROM CONTENT

@RosieCampbell

65 of 83

ADVERSARIAL EXAMPLES

Truck

Imperceptible distortion

Ostrich

Source: karpathy.github.io

@RosieCampbell

66 of 83

ADVERSARIAL EXAMPLES

100.0% Goldfish

Source: karpathy.github.io

@RosieCampbell

67 of 83

GOOGLE’S DEEP DREAM

Using a trained ConvNet, run the process ‘in reverse’
Tell the network “whatever you detect, enhance it!”
Instead of nudging the weights, nudge the image
Repeat

@RosieCampbell

68 of 83

GOOGLE’S DEEP DREAM

Before

After

Source: fromthegrapevine.com

@RosieCampbell

69 of 83

Source: telegraph.co.uk

@RosieCampbell

70 of 83

@RosieCampbell

71 of 83

Source: killscreen.com

@RosieCampbell

72 of 83

So many dogs!?

@RosieCampbell

73 of 83

Don’t expect neural networks to be objective by default
Outputs are influenced by a range of factors
They can learn proxies instead of real insight
They can amplify structural bias and inequality
They do what you say, not what you mean

Pitfalls

@RosieCampbell

This is actually a really interesting and important point. It can be tempting to assume that computers are objective and infallible, when in fact by default they are neither - the network sees what it’s trained to see, and this can result in what is known as ‘algorithmic bias’.

In fact, outputs of neural networks are influenced by a range of factors:

The architecture of the network
The training data and
The training parameters
Implementation
The people involved at each stage who may impart their own biases

Networks can inadvertently learn proxies and correlations instead of real insight. I’ve used lighthearted examples here, but there have been examples of systems learning racist, sexist and otherwise problematic proxies, and even amplifying systemic bias and structural inequalities.

They do what you say, not what you mean. They take things literally and will find ways to achieve the goal that you haven’t anticipated and might not approve of.

74 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

75 of 83

Fairness, Accountability and Transparency movement
Efforts to improve inclusivity and diversity in the field
‘Human-in-the-loop’
Public awareness of the effects of AI
Research on transformative and powerful AI
More discussion of risks vs benefits

AI ETHICS & SAFETY

@RosieCampbell

76 of 83

ORGANIZATIONS

@RosieCampbell

77 of 83

CONTINUED BREAKTHROUGHS

Superhuman abilities in Go and Shogi
Expert human abilities in DOTA 2
Near human abilities in speech recognition
Near human abilities in robotic bipedal locomotion
New approaches such as GANs, Inverse Reinforcement Learning, Fuzzy Logic, etc

@RosieCampbell

78 of 83

CONTENTS

Introduction
Neural Networks vs. Conventional Computing
What is a neuron?
What is a neural network?
Training the network
Convolutional Neural Networks (ConvNets)
Applications and pitfalls
What’s changed since I wrote this talk
Closing summary

@RosieCampbell

79 of 83

Unsupervised learning

Reinforcement Learning

Pooling

Strides

Stochastic & batch training

One hot encoding

Dropout

Regularisation

Recurrent neural nets

Generative adversarial nets

Transfer learning

STUFF WE HAVEN’T COVERED...

...and tons more.

BUT

@RosieCampbell

80 of 83

@RosieCampbell

81 of 83

Construct these with TensorFlow

@RosieCampbell

82 of 83

Construct these with TensorFlow

Good Luck!

@RosieCampbell

83 of 83

THANK YOU!

Blog post

bit.ly/deep-neural-networks

�Resources

tensorflow.org

karpathy.github.io

deeplearningbook.org

dataskeptic.com/podcast�neuralnetworksanddeeplearning.com

online.stanford.edu/courses/cs229-machine-learning

kadenze.com/courses/creative-applications-of-deep-learning-with-tensorflow-iv

@RosieCampbell