1 of 18

Intro to Machine Learning

Girish Varma

IIIT Hyderabad

http://bit.ly/2tzcXHu

2 of 18

A Machine Learning Problem

Given a image of a handwritten digit, find the digit.

No well defined function from input to output.

3 of 18

Programming vs Machine Learning

Machine Learning:

Find the handwritten digit in an image.

  • Collect (image, digit) pairs (dataset).
  • Train a machine learning model to fit the dataset.
  • Given a new image, apply the model to get the digit (testing or inference).

Programming:

Find the shortest path in an input graph G.

  • Implement Dijkstra's algorithm for shortest path in a programming language.

4 of 18

Dataset

  • Consist of (x,y) pairs, x is the input and y is called the label.
  • Examples
    • MNIST: x is a 28x28 b/w image of a handwritten digit, y is a digit in 0 to 9.
    • CIFAR10: x is a 32x32 color image, y is a label in {aeroplane, automobile, bird, car ..}. Y is given as a number in 0 to 9, and there is a mapping between the numbers and the correct label.
  • Divided into train, test and validation.

5 of 18

Tensors

All data, intermediate outputs, learnable parameters are represented by a tensor.

A machine learning model transforms an input tensor to an output tensor.

Tensors have a shape.

  • Tensor T with shape [10,10] is equivalent to a 10x10 matrix. It can be indexed by 2 numbers. T[i,j] is a real number.
  • Tensor can be 3D. T with shape [5, 10, 15] can be indexed by 3 numbers i, j, k (i <= 5, j <=10, k <= 15).
  • Tensor can have arbitrary shape. T with shape [100, 32, 32, 3] can represent 100 color images each 32x32 in size.

6 of 18

Model

The function that maps the input to the output. �y = f𝛉(x)

A model has learnable parameters, 𝛉.

  • Fit a line to a set of points.
    • Slope and offset are learnable parameters.
  • Fit a degree 4 polynomial.
    • Coefficients are learnable parameters.
  • Fit a Multilayered perceptron.
    • Weights and biases are learnable parameters

7 of 18

The Neural Network Model

  • Neuron or Perceptron
    • Input X is n dimensional, Y is 1 dimensional.
    • Has learnable parameters W = (W1,W2,..., Wn) (weights) and b (bias).
    • Y = 𝞂(∑WiXi+ b)
    • 𝞂 is a non linear activation function.�
  • Fully Connected or Linear
    • Y is also multidimensional (dimension m).
    • Has learnable parameters W = ( Wij) and b = ( bj ) �where i <= n, j <= m
    • Y = 𝞂(WX+ b)

8 of 18

MNIST Classification

Input : x is a [28,28] shaped tensor, giving pixel values of the image

Output : y is a [10] shaped tensor, giving the probabilities of being 0 to 9.

If the dataset gives y as a digit, convert it to probability vector by one hot encoding.

Use Softmax function for converting real valued output to probabilities.

9 of 18

Multilayered Network

Complex data fits only more complex models.

Obtain complex models by layering multiple linear layers.

Multilayered Perceptron (MLP)

  • Multiple Linear layers �one following the other.
  • Y = 𝞂(V 𝞂(WX+ b) + c )
  • Intermediate outputs are called�hidden units.

10 of 18

A MLP model for MNIST

Reshape

Fully Connected Layer

Fully Connected Layer

Softmax

p(0)�p(1)

.

.

.

p(8)

p(9)

Predicted probabilities for different digits

11 of 18

Training a Model

The process of finding the right parameters for the model.

12 of 18

Loss Function

  • Loss Function : A function that computes the difference between the predicted output and the correct output.
    • Eg: Mean Squared Error (f(x) - ycorrect)2. ycorrect is also called the ground truth.
    • Eg: Cross Entropy Loss i ycorrect(i)log ypred(i)�

13 of 18

Gradient Descent

Gradient Descent : Change the parameters 𝛉 slightly such that the loss function decreases. Gradients are the partial derivatives of the loss function wrt. the parameters.

14 of 18

Backpropagation

Backpropagation : The process of finding the gradients of parameters in a multilayered network.

15 of 18

Training Algorithm

  • Initialize model with random parameters.
  • Repeat
    • Take a small random subset of the dataset that will fit in memory (minibatch).
    • Forward Pass : pass the subset through the model and obtain predictions
    • Compute the mean loss function for the subset
    • Backward Pass: compute the gradients of the parameters, last layer to the first.
    • Update the gradients using learning rate

16 of 18

Overfitting

17 of 18

Testing or Inference

18 of 18

Some References