1 of 78

Course Name : MACHINE LEARNING

Course Code : 20AD04

Course Instructor : Dr S Naganjaneyulu

Semester : VI

Regulation : R23

Unit: 4

2 of 78

UNIT-4: SYLLABUS

Linear Discriminants for Machine Learning: Introduction to Linear Discriminants, Linear Discriminants for Classification, Perceptron Classifier, Perceptron Learning Algorithm, Support Vector Machines, Linearly Non-Separable Case, Non-linear SVM, Kernel Trick, Logistic Regression, Linear Regression, Multi-Layer Perceptrons (MLPs), Backpropagation for Training an MLP.

3 of 78

Introduction to Linear Discriminants

Linear Discriminant Analysis (LDA) or Normal Discriminant Analysis (NDA) or Discriminant Function Analysis (DFA) is a supervised machine learning algorithm primarily used for dimensionality reduction and classification tasks.
The goal is to find a linear combination of features that maximizes the separate between different classes while minimizing the spread (Variance within each class).
It is used to project the features in higher dimension space into a lower dimension space.
A linear discriminant is a decision function used in classification that separates data points using a linear boundary (a line in 2D, a plane in 3D, or a hyperplane in higher dimensions).

9 of 78

4.3 CLASSIFICATION LEARNING STEPS

10 of 78

Introduction to Perceptron

Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on the original MCP neuron.
A Perceptron is an algorithm for supervised learning of binary classifiers.
This algorithm enables neurons to learn and processes elements in the training set one at a time.
The perceptron was based on the concept of a simple computational unit, which takes one or more inputs and produces a single output, modeled after the structure and function of a neuron in the brain.
The perceptron was designed to be able to learn from examples and adjust its parameters to improve its accuracy in classifying new examples.

11 of 78

STRUCTURE OF A PERCEPTRON

11 / 38

12 of 78

The perceptron algorithm was initially used to solve simple problems, such as recognizing handwritten characters, but it soon faced criticism due to its limited capacity to learn complex patterns and its inability to handle non-linearly separable data.
These limitations led to the decline of research on perceptrons in the 1960s and 1970s.
However, in the 1980s, the development of backpropagation, a powerful algorithm for training multi-layer neural networks, renewed interest in artificial neural networks and sparked a new era of research and innovation in machine learning.
Today, perceptrons are regarded as the simplest form of artificial neural networks and are still widely used in applications such as image recognition, natural language processing, and speech recognition.

13 of 78

Perceptron is considered a single-layer neural link with four main parameters.
The perceptron model begins with multiplying all input values and their weights, then adds these values to create the weighted sum.
Further, this weighted sum is applied to the activation function ‘f’ to obtain the desired output.
This activation function is also known as the step function and is represented by ‘f.’

Step 1: Multiply all input values with corresponding weight values and then add to calculate the weighted sum.

Step 2: An activation function is applied with the above-mentioned weighted sum giving us an output either in binary form or a continuous value .

14 of 78

Basic Components of Perceptron:

Perceptron is a type of artificial neural network, which is a fundamental concept in machine learning. The basic components of a perceptron are:

Weights: Each input neuron is associated with a weight, which represents the strength of the connection between the input neuron and the output neuron.
Bias: A bias term is added to the input layer to provide the perceptron with additional flexibility in modeling complex patterns in the input data.

15 of 78

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally results in output that is conveyed using this layer.

�

3/21/2026

16 of 78

Activation Function: The activation function determines the output of the perceptron based on the weighted sum of the inputs and the bias term. Common activation functions used in perceptrons include the step function, sigmoid function, and ReLU function.
Output: The output of the perceptron is a single binary value, either 0 or 1, which indicates the class or category to which the input data belongs.
Training Algorithm: The perceptron is typically trained using a supervised learning algorithm such as the perceptron learning algorithm or back propagation. During training, the weights and biases of the perceptron are adjusted to minimize the error between the predicted output and the true output for a given set of training examples.

17 of 78

Overall, the perceptron is a simple yet powerful algorithm that can be used to perform binary classification tasks and has paved the way for more complex neural networks used in deep learning today.

Types of Perceptron:

Single layer: Single layer perceptron can learn only linearly separable patterns.
Multilayer: Multilayer perceptron can learn about two or more layers having a greater processing power.
The Perceptron algorithm learns the weights for the input signals in order to draw a linear decision boundary.

18 of 78

3/21/2026

19 of 78

Elements of Neural Networks (Cont’d)

…

bias

Activation function

weights

input

output

…

20 of 78

Single-layer Perceptron

This is the simplest feedforward neural Network and does not contain any hidden layer, Which means it only consists of a single layer of output nodes. This is said to be single because when we count the layers we do not include the input layer, the reason for that is because at the input layer no computations is done, the inputs are fed directly to the outputs via a series of weights.

21 of 78

MULTI-LAYER PERCEPTRONS

A multilayer perceptron (MLP) is a class of a feedforward artificial neural network (ANN). MLPs models are the most basic deep neural network, which is composed of a series of fully connected layers. Today, MLP machine learning methods can be used to overcome the requirement of high computing power required by modern deep learning architectures.
Each new layer is a set of nonlinear functions of a weighted sum of all outputs (fully connected) from the prior one.

21 / 38

▶

22 of 78

Forward Stage: From the input layer in the on stage, activation functions begin and terminate on the output layer.
Backward Stage: In the backward stage, weight and bias values are modified per the model’s requirement. The backstage removed the error between the actual output and demands originating backward on the output layer.
A multilayer perceptron model has a greater processing power and can process linear and non-linear patterns. Further, it also implements logic gates such as AND, OR, XOR, XNOR, and NOR

23 of 78

Inputs of a Perceptron:

A Perceptron accepts inputs, moderates them with certain weight values, then applies the transformation function to output the final result.
A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc.
It has only two values: Yes and No or True and False. The summation function “∑” multiplies all inputs of “x” by weights “w” and then adds them up as follows

3/21/2026

24 of 78

Output of Perceptron

3/21/2026

25 of 78

Forward propagation & Backward Propagation

Forward Propagation:

Forward propagation is the way data moves from left (input layer) to right (output layer) in the neural network.
I/p data is feeded into the n/w and flows through the layers. Each neuron in the i/p layer receives an i/p value and pass it to the neurons in first hidden layer.
Each connection b/w neurons is assigned a weight which is multiplied by the i/p value. These weighted i/p’s are summed up and along with bias (constant) is added to introduce flexibility.
The weighted sum is passed through an activation function which introduces non-linearity into the n/w.
The activation function determines whether the neuron should be activates or not based on the i/p it receives.
The o/p of the activation fun becomes the i/p of particular neuron

and is passed to next layer. The process continues until the o/p layer is reached.

3/21/2026

26 of 78

Backward Propagation

Backward Propagation is the process of moving from right (output layer) to left (input layer).
It adjusts the weights based on the difference b/w the predicted o/p and desired o/p.
It calculates the error at the o/p layer and propagates it backwards through the n/w. The weights are updated using gradient descent taking the derivative of the error with respect to each weight.
The weights are adjusted in the opposite direction of gradient, aiming to minimize the error.
The process is repeated iteratively refining the n/w weights to improve predictions.
The iterative process of forward propagation & backward propagation continues for a set is called epoch until the n/w reaches a satisfactory level of accuracy.

3/21/2026

27 of 78

Support Vector Machines

SVM is one of the most popular Supervised Learning Algorithms, which is used for classification as well as Regression problems.
Primarily it is used for classification problems in machine learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future.
SVM is based on the concept of a surface, called a hyperplane, which draws a boundary between data instances plotted in the multi-dimensional feature space.
SVM algorithm can be used for Face detection , image classification, Text categorization, etc.
.

28 of 78

Key concepts of Support Vector Machines

Support Vectors: Data points that are closest to the hyper plane is called support vectors separating line will be defined with the help of these data points.
Hyper plane : It is a decision plane or space which is divided between a set of objects having different classes.
Margin: It may be defined as the gap between two lines on the closest data points of different classes. It can be calculated as the perpendicular distance from the line to the support vectors. Large margin is considered as a good margin and small margin is considered as bad margin
.

29 of 78

Types of Support Vector Machines

Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data and classifier is used called as Linear SVM classifier.
Non-Linear SVM: Non-Linear SVM is used for Non-linearly separated data which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.

30 of 78

Support Vector Machine Algorithm Steps

The basic steps of SVM are

Select two hyper planes (in 2D) which separates the data with no points between them (red lines) .
Maximize their distance (their margin)
The average line (here the line half way between the two red lines ) will be the decision boundary.

36 of 78

Support Vector Machines Kernel

Kernel trick

Some of the common kernel functions for transforming from a lower dimension ‘i’ to a higher dimension ‘j’ used by different SVM implementations are as follows:

Linear kernel: It is in the form

Polynomial kernel: It is in the form

Sigmoid kernel: It is in the form

Gaussian RBF kernel: It is in the form

When data instances of the classes are closer to each other, kernel trick method can be used.
The effectiveness of SVM depends both on the:

selection of the kernel function
adoption of values for the kernel parameters

40 of 78

Support Vector Machines

Classification using hyperplanes

In SVM, a model is built to discriminate the data instances belonging to different classes.
For simplicity, we assume that the data instances are linearly separable.
In this case, when mapped in a two-dimensional space, the data instances belonging to different classes fall in different sides of a straight line drawn in the two-dimensional space as shown in figure (a).
If the same concept is extended to a multidimensional feature space, the straight line dividing data instances belonging to different classes transforms to a hyperplane as shown in figure (b).

FIG: Linearly separable data instances

41 of 78

Support Vector Machines

Classification using Hyperplanes

For example,

In a two-dimensional feature space (data set having two features and a class variable), a hyperplane will be a one-dimensional subspace or a straight line.
Mathematically, in a two-dimensional space, hyperplane can be defined by the equation:

c₀ + c₁ X₁ + c₂ X₂ = 0

In a three-dimensional feature space (data set having three features and a class variable), hyperplane is a two-dimensional subspace or a simple plane.
In an N-dimensional space, hyperplane can be defined by the equation:

c₀ + c₁ X₁ + c₂ X₂ +……+ c_n X_n= 0 (or)

When a new testing data point/data set is added, the side of the hyperplane it lands on will decide the class that we assign to it. The distance between hyperplane and data points is known as margin.

42 of 78

Support Vector Machines

Identifying the correct hyperplane in SVM

There may be multiple options for hyperplanes dividing the data instances belonging to the different classes.
We need to identify which one will result in the best classification.

Scenario 1:

In this scenario, we have three hyperplanes: A, B, and C.
Now, we need to identify the correct hyperplane which better segregates the two classes represented by the triangles and circles.
We can see, hyperplane ‘A’ has performed this task quite well.

Support vector machine: Scenario 1

43 of 78

Support Vector Machines

Identifying the correct hyperplane in SVM

Scenario 2:

In this scenario, we have three hyperplanes: A, B, and C.
We have to identify the correct hyperplane which classifies the triangles and circles in the best possible way.
Here, maximizing the distances between the nearest data points of both the classes and hyperplane will help us decide the correct hyperplane. This distance is called as margin.
In the below figure, the margin for hyperplane A is high as compared to those for both B and C. Hence, hyperplane A is the correct hyperplane.

Support vector machine: Scenario 2

44 of 78

Support Vector Machines

Identifying the correct hyperplane in SVM

Scenario 3:

If we use the rules as discussed scenario-2 to identify the correct hyperplane in the below figure, we might have selected hyperplane B as it has a higher margin (distance from the class) than A.
SVM selects the hyperplane which classifies the classes accurately before maximizing the margin.
Here, hyperplane B has a classification error, and A has classified all data instances correctly. Therefore, A is the correct hyperplane.

Support vector machine: Scenario 3

45 of 78

Support Vector Machines Strengths and Weakness

Strengths of SVM:

SVM can be used for both classification and regression.
It is robust, i.e. not much impacted by data with noise or outliers.
The prediction results using this model are very promising.

Weaknesses of SVM

SVM is applicable only for binary classification, i.e. when there are only two classes in the problem domain.
The SVM model is very complex – almost like a black box when it deals with a high-dimensional data set. Hence, it is very difficult and close to impossible to understand the model in such cases.
It is slow for a large dataset, i.e. a data set with either a large number of features or a large number of instances.
It is quite memory-intensive.

46 of 78

Support Vector Machines Applications

Applications of SVM:

SVM is most effective when it is used for binary classification, i.e. for solving a machine learning problem with two classes.
Image classification
Image segmentation
Hand-written character recognition
Text categorization
Financial distress prediction
Prediction of common diseases
Sentiment Analysis
Encryption
Geo-spatial applications
Speech Recognition

57 of 78

Logistic Regression

Logistic Regression Equation: The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical steps to get Logistic Regression equations are given below:

In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation by (1-y):

But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will become:

58 of 78

Types of Logistic Regression

On the basis of the categories, Logistic Regression can be classified into three types:

o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc.

o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as "cat", "dogs", or "sheep"

o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as "low", "Medium", or "High".

60 of 78

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression: If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression.

o Multiple Linear regression: If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

61 of 78

Types of Linear Regression

Linear Regression Line A linear line showing the relationship between the dependent and independent variables is called a regression line.

A regression line can show two types of relationship:

o Positive Linear Relationship: If the dependent variable increases on the Y-axis and independent variable increases on X axis, then such a relationship is termed as a Positive linear relationship.

1 of 78

2 of 78

3 of 78

4 of 78

5 of 78

6 of 78

7 of 78

8 of 78

9 of 78

10 of 78

11 of 78

12 of 78

13 of 78

14 of 78

15 of 78

16 of 78

17 of 78

18 of 78

19 of 78

20 of 78

21 of 78

22 of 78

23 of 78

24 of 78

25 of 78

26 of 78

27 of 78

28 of 78

29 of 78

30 of 78

31 of 78

32 of 78

33 of 78

34 of 78

35 of 78

36 of 78

37 of 78

38 of 78

39 of 78

40 of 78

41 of 78

42 of 78

43 of 78

44 of 78

45 of 78

46 of 78

47 of 78

48 of 78

49 of 78

50 of 78

51 of 78

52 of 78

53 of 78

54 of 78

55 of 78

56 of 78

57 of 78

58 of 78

59 of 78

60 of 78

61 of 78

62 of 78

63 of 78

64 of 78

65 of 78

66 of 78

67 of 78

68 of 78

69 of 78

70 of 78

71 of 78

72 of 78

73 of 78

74 of 78

75 of 78

76 of 78

77 of 78

78 of 78