1 of 80

Machine Learning for Social Causes (Part I)

2 of 80

Introduction

Wei Pin

Charlton

3 of 80

About DSC

We aim to make a difference in society, pushing our mission of #TechforGood through developing software solutions for Non-Profit Organizations and running events and workshops to promote the learning of technology skills among the student population

So before we begin I would like to first introduce you guys to our CCA, DSC which is also known as the Developer’s student Club. We are a relatively new cca which was founded back in 2019. Our aim is to harness the power of technology and bring social good to the people and non-for-profit institutions around us. So for instance, we have a dedicated extenral team of student software engineers who are actively engaged in creating innovative and problem solving software solutions as well as our internal team that is in charge of running events like this which you are currently participating. Our main goal of having these workshops is to promote the use of emerging technologies such as computer deep learning to encourage you all to apply and make good use of these skills to help the people around you.

4 of 80

5 of 80

At the end of this workshop, you will learn:

Machine Learning and its applications
Types of Machine Learning
The brain behind ML: Neural Networks
Convolutional Network

6 of 80

Win $10 GrabFood Voucher

Kahoot Quiz at the end!

**Not sponsored by Grab although we wish it was.

7 of 80

What is Machine Learning To You?

Give us your answer in the URL provided in the zoom chat!

8 of 80

Machine Learning

/məˈʃiːn ˈləːnɪŋ/

[noun]

Development of computer systems that are able to learn for themselves without explicit instructions

9 of 80

Applications of Machine Learning

Security and Surveillance

Medical Industry

Digital Media & Intelligence

Self-Driving Vehicles

Robotics and AI

In today’s world, I believe that you must have heard about intelligent robots, self-driving cars as well as many cool technologies that we didn’t really exist decades ago. Here as, you can see, let our briefly take a quick look at the modern applications of machine learning so as to help us to appreciate the importance and widespread use of this technology.

Security and Surveilence - Using facial recognition, and behaviourial detection methods to identify any abnormal activiesites sucha as violence or theft in real-time

Medical industry - Medical Imaging Analysis (MRI/Xray)

Digital Media & Intelligence -

-> Search Engine Optimization

-> Advertisement targeting (Recommend things that you would most likely want to buy based on the data collected about you)

-> Tiktok FYP (Shows you the content that you are most interested in),

-> Siri (Voice Recognition and processing also known as natural language processing which we also will be having a workshop on!)

Self-Driving Vehicles - Which in recent years, it is getting a lot of hype around it, companies such as tesla incorporate deep learning technologies into their vehicle which enables self-driving and route recognition)

10 of 80

Computer

Machine

Data

Rules

Answers

Walking

Cycling

Running

Traditional Programming

To appreciate machine learning, we will first have to look into how traditional programming works, so as you can see, in traditional programming, it consists of two important inputs that the human would have to provide. First, we’ll give the rules as well as data into the computer machine . For data, it would be as shown in the image of a girl walking, running as well as cycling. As for the rules, we would provide it to tell the computer how to handle OR classify this data into different categories. For example, in a hypothetical situation, if the speed of this girl is less than 4km/h the computer would determine that she is walking if she is moving faster than 4km/h but less than 12km/h, the computer would determine her to be running and finally if it’s more than 12 we will classify her as cycling. And as expected, the computer will return us a set of results based on the data and rules we provided as shown here in the slide. So for those who are not familiar with programming, this is a simple explanation of how traditional programming works. There is quite a bit of human intervention needed. We have to tell the computer what they should do

11 of 80

Machine Learning

Computer Machine

(+ML)

Rules

Will identify the distinct patterns/features in a cat image

Data & Answers

CAT

Now, let us flip things around, instead of the example you have seen earlier, we have a similar computer machine here but this time round this particular computer machine is equipped with machine learning capabilities. Let us take a look what are the differences between them. So instead of providing rules and data we would need to provide them with data and answers as shown here. For the data, we have many different images of cats and they are all labeled respectively to show that it is a cat. The images are the data and labels are the answers which is then passed into the computer machine. Instead of returning us with the answers, the computer machine will return us with the set of rules and having this set of rules we would be able to identify distinct patterns and features in a cat image which can see its usefulness in the next slide for us to understand why it is beneficial for us to do so

12 of 80

Machine

(Trained with Images of Cat)

CAT

Uh oh..

NOT CAT

13 of 80

Machine Learning is an ongoing process

Which is why the more we train it, the better our machine gets

(More Diverse Data -> Greater Accuracy)

14 of 80

Let’s Try It Out!

bit.ly/DSCcollab1

http://bit.ly/DSCcollab1

15 of 80

Machine Learning

Categories of Machine Learning

16 of 80

Supervised Learning

Learning through the use of labelled data

Example 1: Classification (Computer Vision)

Cat

Cat/Dog?

Dog

17 of 80

Supervised Learning

Learning through the use of labelled data

Example 2: Regression

X-Values	Y-Values
x1	y1
x2	y2
x3	y3
. . .	. . .

18 of 80

What if we do not have enough data

to train the machine?

19 of 80

Unsupervised Learning

Discovering hidden patterns without human intervention

Example: Clustering

Each point does not have a specific

cluster attached to it at the start

Algorithm discovers which cluster

each point should belong to on its

own

20 of 80

Advantages of Unsupervised Learning

Disadvantages of Unsupervised Learning

Helpful for finding useful insights from the data.

Closely imitates how human learns by their own experiences

Works on unlabeled and uncategorized data which make unsupervised learning more important. (Saves manual work and expenses)

Results maybe less accurate

Time consuming during learning phase

21 of 80

What are the tools used in Machine Learning?

Machine Learning Framework

Programming Language

22 of 80

So how does Machine Learning works?

23 of 80

3 Learning Objectives

Neurons, Neural Layers and Network

2. How Neural Network learn

3. What is backpropagation and how does it work?

24 of 80

Credits: GumGum

25 of 80

Analogy

Consensus:

9

Consensus: Not 9

+Bias

Jack

Jordan

Jane

Weight of opinion = How much this person’s the opinion matter

+Bias

Alice

Tim

Ken

IT IS NOT 9!

IT IS 9

I hope the video was helpful now, to further aid you in your understanding, I would be using an analogy here that will be helpful in familiarising you with the system.

Imagine you have this handwritten image of the number 9 on the table and your aim here is to determine if this digit is 9 or not a 9. Of course, all of us here know what is the correct answer if not I will be concerned. So, you proceed to show it to 3 of your friends (All of whom we assume that have no idea of how 9 looks like). Here, all of them have their own biases based on their own experiences and other factors, so we need to take note of that as well.

Your 3 friends, Jack, Jordan and Jane then whispered to 3 of their other of their friends each similarly having their own bias and then finally the last 3 friends we see here, Alice, Tim and Ken would then form a consensus as to whether that digit is 9 or not 9.

Assuming their answer is no, what happen? We will go backwards at tell them that the correct answer is 9 and they would have to go back and discuss to determine whose opinion is actually correct. From there, the ones that said that it is not 9, their opinion would hold lesser weight since it is wrong while those that suggested it is 9 which is correct, their opinion will hold a greater weight. Eventually, they would form the correct consensus after each round of discussion until they get it right.

26 of 80

How is the analogy applicable?

Consensus:

9

Consensus: Not 9

+Bias

0.32

0.98

0.40

0.30

0.50

0.65

27 of 80

A Neuron/Node

0.00

This number is called

“Activation”

0.00

1.00

0.45

The number represents the greyscale value of a particular pixel

28 of 80

How it will turn out..

29 of 80

And what is a neural layer?

0.32

0.98

0.40

0.32

.

Neural Layer: A collection of 'neurons'/nodes operating together at the same column in a neural network

30 of 80

Neural Networks

A computer system modelled on the human brain. Made up of multiple neural layers

/ˈnjʊər(ə)l ˈnɛtwəːk/

[noun]

Human Brain Neuron Network

ML Neural Network

31 of 80

A neural network

0.32

0.98

0.40

0.32

.

Input

Layer

.

8

Output

Layer

1

2

9

.

9 Neurons

Basically Many Layers

Here

Hidden

Layer

32 of 80

Visualising Neural Network (Simplified)

Note: This Neural Network has already been trained

33 of 80

What is in the hidden layer?

34 of 80

0.32

0.98

0.40

0.32

8

1

2

9

9 Neurons

.

35 of 80

How different layers interact with each other

0.35

0.09

x

0.80

0.20

Finding/Optimising the correct weight value for each of the channel (Those lines that you see) is our aim in ML

Weights

Control the signal/strength of a connection

between 2 neurons

0.3

0.57

0.8

0.70

x = F((A₁W₁ + A₂W₂ + … + A_xW_x) + Bias)

x = F((0.2*0.7 + 0.8*0.8 + 0.35*0.3 + 0.09*0.57) + Bias)

x is the result we get after passing in the sum of the product of each respective neuron activation and its weight into an activation function

Bias is when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process

36 of 80

Activation Function

Linear

Non-Linear

Assume that the red and blue circles as items we are trying to classify

37 of 80

Common Types of Activation Functions

38 of 80

Which Activation Functions do we choose?

Problem Type

Classification

Regression

Binary

Classification

Multiclass

Classification

Multilabel

Classification

Sigmoid

Activation

Softmax

Activation

Sigmoid

Activation

Linear

Activation

39 of 80

x = F((A1*W1 + A2*W2 + A3*W3 + A4*W4) + Bias)

40 of 80

You probably wonder..

Why 2 hidden layers and why the number of neurons in each layer?

Deep learning is a combination of art and science. The number of layers and neurons chosen in the hidden layer(s) are arbitrary. It requires you to fine-tune and test to find the right number for the best training result

41 of 80

Let’s Visualize it!

https://bit.ly/DSCNeuralHandwriting

Multilayer Perceptron Visualisation on handwritten numbers:

42 of 80

How Neural Network learn?

43 of 80

0.32

0.98

0.40

0.32

.

8

1

2

9

.

Hidden

Layer

0.12

0.32

0.20

0.75

.

TRAINED Neural Network

44 of 80

0.32

0.98

0.40

0.32

.

8

1

2

9

.

Hidden

Layer

0.81

0.55

0.62

0.72

.

UNTRAINED Neural Network

Wrong!

Here, the network is running for the first time, so it is making a random guess

45 of 80

UNTRAINED Neural Network (Animated)

46 of 80

.

8

1

2

9

.

Hidden

Layer

0.81

0.55

0.62

0.72

.

8

1

2

9

.

0.00

1.00

.

Actual

Expected

Probability

47 of 80

How does it “improve” itself?

Cost/Loss Function

A technique we use to measure the performance/correctness of our algorithm/machine learning model

(Actual - Expected)²

48 of 80

Cost/Loss Function

Quantifying the differences in expected vs actual result

49 of 80

Purpose of Training a neural network

To minimise the cost/loss value to as close to zero

50 of 80

How it is achieved

Through repeated training over the neural network with different data

a.k.a telling the computer its mistakes and what should it do to re-adjust for a better outcome

Backpropagation

A way of propagating the total loss back into the neural network to know how much of the loss every node is responsible for,

51 of 80

We cannot change the activation of the neurons directly, the only variables here are the weights and bias (The Lines/Channel)

52 of 80

Why is it called Backpropagation

53 of 80

Addendum: For the curious mind (Optional)

Due to time constraint and the depth required, if you are

interested to find out the maths behind how the neural network learn please feel free to visit this after the end of this workshop:

bit.ly/DSCGradientDescent

54 of 80

How it works?

bit.ly/DSCTFplayground

55 of 80

To summarize:

56 of 80

Coding a Neural Network

https://bit.ly/DSCcollab2

57 of 80

Coding a Neural Network

58 of 80

Coding a Neural Network

Predicting y=x Graph

59 of 80

Coding a Neural Network

Loss and Optimizers

Loss - A mathematical way of measuring how "wrong" our predictions are

Optimizers - An algorithm to help us minimise loss

You can see that there are 2 new parameters here, namely loss and optimizers. So what are these 2 things? I'll go through them a little bit more in-depth although the algorithmic part can be a bit complicated. So simply put, loss is a mathematical way to measure how "wrong" our predictions are. How we want to calculate this might differ depending on what we are trying to predict. In this case, we are using "mean_squared_error" which is the square of difference between our predicted value and the actual value, one that was mentioned earlier, but what loss calculation we use depends on what our model is trying to predict, i'll go through another way of calculating loss later. Optimizers help the machine make sense of the changing loss values to tweak the weights in order to help us achieve the lowest loss possible.

60 of 80

Common Problems of Machine Learning

Overfitting and Underfitting

Think about a student who simply does the practice paper thousands and thousands of times. Sure, he'll be very familiar with the content, but note that in this context, he is doing the EXACT same paper over and over again. Firstly, there will inevitably come a point where the student has learnt everything they can from that paper already and there is nothing left to learn. Then, doing the paper even more times will not change how well the student will do in subsequent tests. However, an even more serious problem might occur. That is, the student might subconsciously just remember all the answers, without absorbing any of the content. Then, when the student does the paper in the subsequent times, they just regurgitate the answer they have memorised. This gives them a good score in the practice paper and makes them think that they are improving when in actual fact, they aren't learning anything and are simply remembering more and more of the answers. When a subsequent test comes up that doesn't have the exact questions in the practice paper, however, then the student will not perform well as they did not actually learn anything.

This phenomenon can actually occur in machine learning as well. That is, the computer has force fit the model based on the data it was given so that it can perform well on any value that it has already been given, but is not able to give the correct answer when a value it has never seen before is given to it. Let's look at the diagrams for a better visualisation of how this happens.

On the right we see the best fit curve that we actually want from the data points given. In the centre however, this is an example of an overfit model. The model has forced the line to pass through every single data point it was given, giving a weird shape instead of the curve we initially wanted. With this new curve that was forced to go through all the points, you can see that it gives us a very different expected output. If we look at this same point on both graphs, the correct best fit curve should give us a value at this green line. However, when we give this same x value to the overfit model, it returns us an expected value of the red line, which is very different. Here, we can see how a model that has been trained too much on the same data can give us a model that is actually incorrect and poor in predicting other values.

The other end of this problem is called underfitting. This scenario usually occurs when you are training the model on data that it is not suitable to predict. In the diagram we have, we can see how this happens. We are using a straight line graph to predict what is obviously a curve and will give us very poor results when it is trained.

61 of 80

Coding a Neural Network

How are images processed by computers?

Think about it this way:

Training Data:

Past Year Exam Papers

Validation Data:

Mock Exam Papers

Test Data:

The Actual Exam

Let's link back to the analogy of students doing practice papers to prepare for a final exam, the student being the model itself. In this analogy, we can think of the training data as past year exam practice papers as we had before. The students are able to see previous questions and their corresponding answers and learn from them in order to prepare for the final exam. Then, validation data can be likened to Mock Exam Papers. It's something like checkpoints. Every time the students do the practice papers, they will then do the mock papers to see how much they have learnt so far and to gauge whether they are learning properly. This is usually how one cycle of an epoch works. The model will learn on the training data and then test itself on the validation data to see how well it is performing. This helps us observe if our model is improving or simply stagnating - where learning from the training data will no longer be beneficial. By adjusting the weights based on data that it was not trained on, will also help in reducing the chances of the model overfitting.

62 of 80

Coding a Neural Network

How are images processed by computers?

Images are made of pixels of different colours.

Pixels contain Red, Green, Blue (RGB) values to indicate what colour it should be

Red: 255, Green: 0, Blue: 0

Red: 0, Green: 255, Blue: 0

Red: 0, Green: 0, Blue: 255

Red: 120, Green: 120, Blue: 120

Red: 0, Green: 0, Blue: 0

Red: 255, Green: 255, Blue: 255

63 of 80

Coding a Neural Network

Simpler way to process some images

Grayscale (if colour is not important to the model)

Value: 0

Value: 255

64 of 80

Coding a Neural Network

Normalization

Transforms data to values between 0-1
Helps model better understand the minimum and maximum values of the input data
Makes computation easier and learning more accurate

Let's look at an analogy to see why this is useful!

65 of 80

Coding a Neural Network

Normalization

A

B

10 marks

20 marks

Max score: 20

Max score: 100

50%

100%

20%

10%

50%

Difference:

10%

66 of 80

Convolutional Neural Networks

A type of neural network used in image recognition and processing, specifically designed to process pixel data

/ˌkɒnvəˈluːʃ(ə)n(ə)l ˈnjʊər(ə)l ˈnɛtwəːk/

[noun]

67 of 80

Convolutional Neural Network

Adding Layers Together

68 of 80

Convolutional Neural Network Visualisation

https://bit.ly/DSCConvHandwriting

Let's see a quick visualisation of how this comes into play. I'll explain more about what the convolutional layer and max pooling layer do afterwards, but for now let's just see a 3D representation of the CNN model.
So just write any number on the top left, similar to the earlier visualisation. The visualisation should light up. Just hover over any cube in the entire neural network and u can see which neurons it was connected to in the previous layers. This means that the current neuron took in inputs from these connected neurons in the previous layer multiplied by their weights to obtain its current value. From this visualisation, you can see the layers mentioned. This first layer is the Convolution Layer. Followed by a downsampling layer which uses the max pooling method, so we also refer to it as the max pooling layer, which will be discussed more later. Then this is followed by another convolution layer and another max pooling layer. This is followed by 2 fully connected layers, also known as dense layers. These 2 layers are exactly as the neural network layers we have learnt previously. This is exactly the same as the diagram we have earlier but provides a better visualisation to see how they are connected and work together in trying to find out what value was written
Now you may be asking why there is a need for a whole other type of neural network. Why do we need these convolution layers? Let's take a look at why a normal neural network wouldn't be as effective in detecting images and pixel data.

69 of 80

Convolutional Neural Networks

Why convolution?

The problem with conventional neural networks and image processing

Unable to account for different positions of the target

???

Algorithm is skewed to detect cats in the centre, but isn't always the case

We learnt earlier that in a neural network, each neuron will pass on its information with a certain weight to it. Imagine now that each pixel in an image is now actually a neuron. In a neural network, each neuron will refer to the same pixel each time. This presents a problem in image recognition. The thing about image recognition is that the object of choice might not always be in the same regions of the image. So if we take a look at this example, when we feed into our neural network these 4 cat images, the learning process will cause the neurons or pixels in the centre to have a much higher weight, since from all its training, that's all the model's ever been told. Similarly, the pixels outside of the centre green box will be taken into account less and be more neglected compared to the centre, or in other words, their weight would decrease.
This poses a problem when we feed an image that has a cat that is not in the centre of the image. The neural network will heavily prioritise looking at the centre of the image for the cat and neglect the areas outside of the green box. When it realises that there is no cat in the green box, the model will simply think there is no cat in the image, which is entirely false.

70 of 80

Convolutional Neural Networks

Why convolution?

Demo with the Number 1:

Let's see this problem happen in real time. We have 2 visualisation tools, one for the Multilayer perceptron or let's just call it a basic neural network and one for the convoluted neural network. Let's test it with the number 1, which is a simple stroke. Let's do it in the centre of the drawing box for both first.
Test with 1 at the side of the drawing box
Given what a simple number 1 is, which is just looking for a straight line, you'd expect the model to at least be able to pick this up with decent accuracy. However, u can see that the basic NN is extremely poor at predicting this while the CNN has a much higher accuracy at picking this up.
Now you may be trying other numbers by the side and thinking that the CNN may not be as ideal in picking this up too and that's mainly for 2 reasons. One, no model is perfect and is bound to make wrong predictions. Secondly, it is important the the size of the number you are writing is rather big as well and not squashed up in one side, as most of the data given was a proper looking number that was not purposely written at the side of the frame.
But in general, a CNN will perform much better than a basic NN in image detection for this reason.

71 of 80

Convolutional Neural Networks

The Convolutional Layer

Use of filters for feature extraction

What are filters exactly?

Perform calculations on different

parts of the image

Gives a "feature map"

New image with calculated results from

the filter

165

1

170

1

2

1

240

238

1

151

160

0

Now, let's dig deeper into how a CNN works now. The fundamental unit of a CNN is what we call a filter, or a kernel. These filters are applied to the image in a systematic manner to pick out key information, according to the model's discretion. A filter is applied by performing calculations different parts of the image and producing a new output number. The new matrix of all these numbers after applying the filter gives rise to another image, which we call the feature map.

So on the right here, we have a picture of a cat. Let's say hypothetically we have a filter that is able to detect whether there is a cat present in the selected part of the image. So the first value at the top left is 1 because there obviously is no cat there.

*PLAY ANIMATION*

Now, this filter will be systematically applied throughout the entire image as shown in this animation. As you can see, when the filter detects a cat, it will produce a higher value to indicate the presence of a cat in the feature map. This feature map will then be passed on to the next layer on the CNN.

*click*

Here's a better visualisation of how the filter is applied systematically throughout an image. It simply slides across slowly until the entire image is covered. The filter itself is also a matrix of size smaller than the image itself and this size is often up to our discretion. For those more interested in the math, it usually does a dot product with the selected matrix of the image to achieve the final number.

*next slide*

Each filter is often specifically used to identify and pick out a specific type of feature. Now we can try it ourselves! And try to visualise how these filters can manipulate and change the images to extract the important information. In fact, these filters are commonly used by photo editing apps such as photoshop to edit our pictures too!.

72 of 80

Convolutional Neural Network

Filters

Each filter is often used to identify specific features such as edges or corners.

Try it yourself!

https://bit.ly/DSCFiltersHandsOn

But what if we want to identify something more? For example, eyes, noses, faces?

73 of 80

Convolutional Neural Network

Hierarchical Nature of Filters

Lower level filters identify lower level features

Passed on to higher level filters which identify higher level features comprising these lower level features

EYES

MOUTH

FACE

EYES FILTER

MOUTH FILTER

FACE FILTER

These filters that we talk about actually have a hierarchical nature, which is why we often have more than 1 convolutional layer. The first convolutional layer mainly looks at lower level features, making a feature map that is passed on to the next layer. The next layer will then use this feature map of lower level features to try and identify higher level features. Let's take a look at a real example to better understand what these lower level and higher level features mean.

So let's say we have a filter that is able to identify eyes

-> systematically applied to image to give this feature map

-> has eyes at these 2 points

Now let's say there is another low level filter that is able to identify mouths. Similarly systematically applied to image to give feature map

-> mouth at that 1 point

This produces 2 feature maps in the first convolutional layer. In the next convolutional layer, there will be another filter that will leverage these 2 feature maps that have been produced at the lower level. Such a filter can be a face filter that is able to detect faces. How this face filter might be implemented might be that it will look at the eye feature map and see if there are 2 eyes at a certain position and then it can look at the mouth feature map and see if there is a mouth located just below the 2 eyes detected in the eye feature map. By combining the information obtained from 2 lower level features such as eyes and mouth, the CNN is able to detect higher level features such as faces. This is done repeatedly for different features and repeated in higher and higher layers.

74 of 80

Max Pooling Layer

Don't images have a lot of pixels? Won't it take very long to run an algorithm on so many pixels?

Reduces amount of information to decrease computation time
Extracts the maximum value - able to maintain information while downsampling

75 of 80

Other Pooling Layers

Are there other types of pooling?

Every pooling layer has their own advantage
Average Pooling Layer
Minimum Pooling Layer

76 of 80

Convolutional Neural Network

Adding Layers Together (Recap)

77 of 80

Hands-on Demo

Use the following Link and copy the Notebook onto your own Google drive:

https://bit.ly/DSCSignLangDemo

78 of 80

Quiz!

Stand a chance to get a $10 GrabFood Voucher!

79 of 80

80 of 80

Thank you for joining us in this workshop!

Workshop will be recorded and uploaded to

DSC Youtube Channel