1 of 85

Learning with Regression and Tree​

2 of 85

Regression Analysis in Machine learning​

  • Regression analysis is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables.
  • More specifically, Regression analysis helps us to understand how the value of the dependent variable is changing corresponding to an independent variable when other independent variables are held fixed.
  • It predicts continuous/real values such as temperature, age, salary, price, etc.
  • "Regression shows a line or curve that passes through all the data points on target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum."
  • Some examples of regression can be as:
    • Prediction of rain using temperature and other factors
    • Determining Market trends
    • Prediction of road accidents due to rash driving.

3 of 85

Terminologies Related to the Regression Analysis:

  • Dependent Variable: The main factor in Regression analysis which we want to predict or understand is called the dependent variable. It is also called target variable.
  • Independent Variable: The factors which affect the dependent variables or which are used to predict the values of the dependent variables are called independent variable, also called as a predictor.
  • Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed values. An outlier may hamper the result, so it should be avoided.
  • Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting variable.
  • Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then such problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is called underfitting.

4 of 85

Why do we use Regression Analysis?

  • As mentioned above, Regression analysis helps in the prediction of a continuous variable.
  • There are various scenarios in the real world where we need some future predictions such as weather condition, sales prediction, marketing trends, etc., for such case we need some technology which can make predictions more accurately.
  • So for such case we need Regression analysis which is a statistical method and used in machine learning and data science.
  • Below are some other reasons for using Regression analysis:
    • Regression estimates the relationship between the target and the independent variable.
    • It is used to find the trends in data.
    • It helps to predict real/continuous values.
    • By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors.

5 of 85

Types of Regression

There are various types of regressions which are used in data science and machine learning.

Each type has its own importance on different scenarios, but at the core, all the regression methods analyze the effect of the independent variable on dependent variables. Here we are discussing some important types of regression which are given below:

  • Linear Regression
  • Logistic Regression
  • Polynomial Regression
  • Support Vector Regression
  • Decision Tree Regression
  • Random Forest Regression
  • Ridge Regression
  • Lasso Regression:

6 of 85

Types of Regression Models:

7 of 85

Linear Regression:

  • Linear regression is a statistical regression method which is used for predictive analysis.
  • It is one of the very simple and easy algorithms which works on regression and shows the relationship between the continuous variables.
  • It is used for solving the regression problem in machine learning.
  • Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent variable (Y-axis), hence called linear regression.
  • If there is only one input variable (x), then such linear regression is called simple linear regression.
  • And if there is more than one input variable, then such linear regression is called multiple linear regression.
  • The relationship between variables in the linear regression model can be explained using the image. Here we are predicting the salary of an employee on the basis of the year of experience.
  • the mathematical equation for Linear regression: Y= aX+b

Here, Y = dependent variables (target variables),

X= Independent variables (predictor variables),

a and b are the linear coefficients

8 of 85

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

  • Simple Linear Regression:� If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression.
  • Multiple Linear regression:� If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

Some popular applications of linear regression are:

  • Analyzing trends and sales estimates
  • Salary forecasting
  • Real estate prediction
  • Arriving at ETAs in traffic.

9 of 85

Logistic Regression:

  • Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1.
  • Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
  • It is a predictive analysis algorithm which works on the concept of probability.
  • Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term how they are used.
  • Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used to model the data in logistic regression. The function can be represented as:

10 of 85

The function can be represented as:

  • f(x)= Output between the 0 and 1 value.
  • x= input to the function
  • e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

  • Binary(0/1, pass/fail)
  • Multi(cats, dogs, lions)
  • Ordinal(low, medium, high)

11 of 85

Polynomial Regression:

  • Polynomial Regression is a type of regression which models the non-linear dataset using a linear model.
  • It is similar to multiple linear regression, but it fits a non-linear curve between the value of x and corresponding conditional values of y.
  • Suppose there is a dataset which consists of datapoints which are present in a non-linear fashion, so for such case, linear regression will not best fit to those datapoints. To cover such datapoints, we need Polynomial regression.
  • In Polynomial regression, the original features are transformed into polynomial features of given degree and then modeled using a linear model. Which means the datapoints are best fitted using a polynomial line.
  • The equation for polynomial regression also derived from linear regression equation that means Linear regression equation Y= b0+ b1x, is transformed into Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
  • Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is our independent/input variable.
  • The model is still linear as the coefficients are still linear with quadratic

12 of 85

Support Vector Regression:

Support Vector Machine is a supervised learning algorithm which can be used for regression as well as classification problems. So if we use it for regression problems, then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous variables. Below are some keywords which are used in Support Vector Regression:

  • Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
  • Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line which helps to predict the continuous variables and cover most of the datapoints.
  • Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin for datapoints.
  • Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and opposite class.

13 of 85

In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum number of datapoints are covered in that margin. The main goal of SVR is to consider the maximum datapoints within the boundary lines and the hyperplane (best-fit line) must contain a maximum number of datapoints. Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are known as boundary lines.

14 of 85

Decision Tree Regression:

Decision Tree is a supervised learning algorithm which can be used for solving both classification and regression problems.

It can solve problems for both categorical and numerical data

Decision Tree regression builds a tree-like structure in which each internal node represents the "test" for an attribute, each branch represent the result of the test, and each leaf node represents the final decision or result.

A decision tree is constructed starting from the root node/parent node (dataset), which splits into left and right child nodes (subsets of dataset). These child nodes are further divided into their children node, and themselves become the parent node of those nodes.

15 of 85

Ridge Regression:

  • Ridge regression is one of the most robust versions of linear regression in which a small amount of bias is introduced so that we can get better long term predictions.
  • The amount of bias added to the model is known as Ridge Regression penalty. We can compute this penalty term by multiplying with the lambda to the squared weight of each individual features.
  • The equation for ridge regression will be:

  • A general linear or polynomial regression will fail if there is high collinearity between the independent variables, so to solve such problems, Ridge regression can be used.
  • Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is also called as L2 regularization.
  • It helps to solve the problems if we have more parameters than samples.

16 of 85

Lasso Regression:

  • Lasso regression is another regularization technique to reduce the complexity of the model.
  • It is similar to the Ridge Regression except that penalty term contains only the absolute weights instead of a square of weights.
  • Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near to 0.
  • It is also called as L1 regularization. The equation for Lasso regression will be:

17 of 85

Overfitting and Underfitting in Machine Learning

  • Overfitting and Underfitting are the two main problems that occur in machine learning and degrade the performance of the machine learning models.
  • The main goal of each machine learning model is to generalize well.
  • Here generalization defines the ability of an ML model to provide a suitable output by adapting the given set of unknown input.
  • It means after providing training on the dataset, it can produce reliable and accurate output.
  • Hence, the underfitting and overfitting are the two terms that need to be checked for the performance of the model and whether the model is generalizing well or not.

18 of 85

Before understanding the overfitting and underfitting, let's understand some basic term that will help to understand this topic well:

  • Signal: It refers to the true underlying pattern of the data that helps the machine learning model to learn from the data.
  • Noise: Noise is unnecessary and irrelevant data that reduces the performance of the model.
  • Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the machine learning algorithms. Or it is the difference between the predicted values and the actual values.
  • Variance: If the machine learning model performs well with the training dataset, but does not perform well with the test dataset, then variance occurs.

19 of 85

Overfitting

  • Overfitting occurs when our machine learning
  • model tries to cover all the data points or more than the required data points present in the given dataset.
  • Because of this, the model starts caching noise and inaccurate values present in the dataset, and all these factors reduce the efficiency and accuracy of the model.
  • The overfitted model has low bias and high variance.
  • The chances of occurrence of overfitting increase as much we provide training to our model. It means the more we train our model, the more chances of occurring the overfitted model.
  • Overfitting is the main problem that occurs in supervised learning

20 of 85

How to avoid the Overfitting in Model

  • Both overfitting and underfitting cause the degraded performance of the machine learning model. But the main cause is overfitting, so there are some ways by which we can reduce the occurrence of overfitting in our model.
    • Cross-Validation
    • Training with more data
    • Removing features
    • Early stopping the training
    • Regularization
    • Ensembling

21 of 85

Underfitting

  • Underfitting occurs when our machine learning model is not able to capture the underlying trend of the data.
  • To avoid the overfitting in the model, the fed of training data can be stopped at an early stage, due to which the model may not learn enough from the training data.
  • As a result, it may fail to find the best fit of the dominant trend in the data.
  • In the case of underfitting, the model is not able to learn enough from the training data, and hence it reduces the accuracy and produces unreliable predictions.
  • An underfitted model has high bias and low variance.

How to avoid underfitting:

  • By increasing the training time of the model.
  • By increasing the number of features.

22 of 85

Goodness of Fit

  • The "Goodness of fit" term is taken from the statistics, and the goal of the machine learning models to achieve the goodness of fit. In statistics modeling, it defines how closely the result or predicted values match the true values of the dataset.
  • The model with a good fit is between the underfitted and overfitted model, and ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve it.
  • As when we train our model for a time, the errors in the training data go down, and the same happens with test data.
  • But if we train the model for a long duration, then the performance of the model may decrease due to the overfitting, as the model also learn the noise present in the dataset.
  • The errors in the test dataset start increasing, so the point, just before the raising of errors, is the good point, and we can stop here for achieving a good model.

23 of 85

Bias and Variance in Machine Learning

  • If the machine learning model is not accurate, it can make predictions errors, and these prediction errors are usually known as Bias and Variance.
  • In machine learning, these errors will always be present as there is always a slight difference between the model predictions and actual predictions.
  • The main aim of ML/data science analysts is to reduce these errors in order to get more accurate results.

24 of 85

Errors in Machine Learning?

  • In machine learning, an error is a measure of how accurately an algorithm can make predictions for the previously unknown dataset.
  • On the basis of these errors, the machine learning model is selected that can perform best on the particular dataset.
  • There are mainly two types of errors in machine learning, which are:
    • Reducible errors: These errors can be reduced to improve the model accuracy. Such errors can further be classified into bias and Variance.
    • Irreducible errors: These errors will always be present in the model

25 of 85

What is Bias?

  • While making predictions, a difference occurs between prediction values made by the model and actual values/expected values, and this difference is known as bias errors or Errors due to bias.
  • A model has either:
    • Low Bias: A low bias model will make fewer assumptions about the form of the target function.
    • High Bias: A model with a high bias makes more assumptions, and the model becomes unable to capture the important features of our dataset. A high bias model also cannot perform well on new data.
  • Some examples of machine learning algorithms with low bias are
    • Decision Trees,
    • k-Nearest Neighbours and
    • Support Vector Machines.
  • an algorithm with high bias is
    • Linear Regression,
    • Linear Discriminant Analysis and
    • Logistic Regression.

26 of 85

What is a Variance Error?

Variance tells that how much a random variable is different from its expected value.

  • Low variance means there is a small variation in the prediction of the target function with changes in the training data set. At the same time,
  • High variance shows a large variation in the prediction of the target function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the training dataset, and does not generalize well with the unseen dataset. As a result, such a model gives good results with the training dataset but shows high error rates on the test dataset.

Since, with high variance, the model learns too much from the dataset, it leads to overfitting of the model. A model with high variance has the below problems:

  • A high variance model leads to overfitting.
  • Increase model complexities.

27 of 85

Ways to reduce High Bias:

High bias mainly occurs due to a much simple model. Below are some ways to reduce the high bias:

  • Increase the input features as the model is underfitted.
  • Decrease the regularization term.
  • Use more complex models, such as including some polynomial features.

Ways to Reduce High Variance:

  • Reduce the input features or number of parameters as a model is overfitted.
  • Do not use a much complex model.
  • Increase the training data.
  • Increase the Regularization term.

28 of 85

29 of 85

Different Combinations of Bias-Variance

There are four possible combinations of bias and variances

  1. Low-Bias, Low-Variance:�The combination of low bias and low variance shows an ideal machine learning model. However, it is not possible practically.
  2. Low-Bias, High-Variance: With low bias and high variance, model predictions are inconsistent and accurate on average. This case occurs when the model learns with a large number of parameters and hence leads to an overfitting
  3. High-Bias, Low-Variance: With High bias and low variance, predictions are consistent but inaccurate on average. This case occurs when a model does not learn well with the training dataset or uses few numbers of the parameter. It leads to underfitting problems in the model.
  4. High-Bias, High-Variance:�With high bias and high variance, predictions are inconsistent and also inaccurate on average.

30 of 85

Bias-Variance Trade-Off

  • While building the machine learning model, it is really important to take care of bias and variance in order to avoid overfitting and underfitting in the model.
  • If the model is very simple with fewer parameters, it may have low variance and high bias.
  • Whereas, if the model has a large number of parameters, it will have high variance and low bias.
  • So, it is required to make a balance between bias and variance errors, and this balance between the bias error and variance error is known as the Bias-Variance trade-off.

For an accurate prediction of the model, algorithms need a low variance and low bias. But this is not possible because bias and variance are related to each other:

  • If we decrease the variance, it will increase the bias.
  • If we decrease the bias, it will increase the variance.

31 of 85

  • Bias-Variance trade-off is a central issue in supervised learning.
  • Ideally, we need a model that accurately captures the regularities in training data and simultaneously generalizes well with the unseen dataset. Unfortunately, doing this is not possible simultaneously.
  • Because a high variance algorithm may perform well with training data, but it may lead to overfitting to noisy data.
  • Whereas, high bias algorithm generates a much simple model that may not even capture important regularities in the data. So, we need to find a sweet spot between bias and variance to make an optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make a balance between bias and variance errors.

32 of 85

33 of 85

Linear Regression

  • Linear regression is one of the easiest and most popular Machine Learning algorithms.
  • It is a statistical method that is used for predictive analysis.
  • Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc.
  • Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression.
  • Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.

34 of 85

Types of Linear Regression

  • Simple Linear Regression:� If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression.
  • Multiple Linear regression:� If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

35 of 85

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is called a regression line. A regression line can show two types of relationship:

Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then such a relationship is called a negative linear relationship.

Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then such a relationship is termed as a Positive linear relationship.

36 of 85

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best fit line will have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost function.

Cost function-

  • The different values for weights or coefficient of lines (a0, a1) gives the different line of regression, and the cost function is used to estimate the values of the coefficient for the best fit line.
  • Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is performing.
  • We can use the cost function to find the accuracy of the mapping function, which maps the input variable to the output variable. This mapping function is also known as Hypothesis function

37 of 85

What is Linear Regression?

  • Simple linear regression is a regression technique in which the independent variable has a linear relationship with the dependent variable. The straight line in the diagram is the best fit line. The main goal of the simple linear regression is to consider the given data points and plot the best fit line to fit the model in the best way possible.

38 of 85

Linear Regression Terminologies

Cost Function

The best fit line can be based on the linear equation given below.

  • The dependent variable that is to be predicted is denoted by Y.
  • A line that touches the y-axis is denoted by the intercept b0.
  • b1 is the slope of the line, x represents the independent variables that determine the prediction of Y.
  • The error in the resultant prediction is denoted by e.

39 of 85

Advantages

Disadvantages

Linear regression performs exceptionally well for linearly separable data

The assumption of linearity between dependent and independent variables

Easier to implement, interpret and efficient to train

It is often quite prone to noise and overfitting

It handles overfitting pretty well using dimensionally reduction techniques, regularization, and cross-validation

Linear regression is quite sensitive to outliers

One more advantage is the extrapolation beyond a specific data set

It is prone to multicollinearity

Advantages And Disadvantages

40 of 85

41 of 85

42 of 85

43 of 85

44 of 85

45 of 85

46 of 85

47 of 85

48 of 85

49 of 85

50 of 85

51 of 85

52 of 85

53 of 85

54 of 85

55 of 85

56 of 85

57 of 85

58 of 85

Linear Regression Formula

  • linear regression shows the linear relationship between two variables.
  • The equation of linear regression is similar to that of the slope formula.
  • Linear Regression Formula is given by the equation

Y= a + bX

We will find the value of a and b by using the below formula

Where,

x and y are two variables on the regression line.

b = Slope of the line.

a = y-intercept of the line.

x = Values of the first data set.

y = Values of the second data set.

59 of 85

Least Square Regression Line or Linear Regression Line

  • The most popular method to fit a regression line in the XY plot is found by using least-squares.
  • This process is used to determine the best-fitting line for the given data by reducing the sum of the squares of the vertical deviations from each data point to the line.
  • If a point rests on the fitted line accurately, then the value of its perpendicular deviation is 0.
  • It is 0 because the variations are first squared, then added, so their positive and negative values will not be cancelled.
  • Linear regression determines the straight line, known as the least-squares regression line or LSRL. Suppose Y is a dependent variable and X is an independent variable, then the population regression line is given by the equation

Y= B0+B1X

Where

B0 is a constant

B1 is the regression coefficient

  • When a random sample of observations is given, then the regression line is expressed as;

ŷ = b0+b1x

where b0 is a constant

b1 is the regression coefficient,

x is the independent variable,

ŷ is known as the predicted value of the dependent variable.

60 of 85

Properties of Linear Regression

For the regression line where the regression parameters b0 and b1are defined, the following properties are applicable:

  • The regression line reduces the sum of squared differences between observed values and predicted values.
  • The regression line passes through the mean of X and Y variable values.
  • The regression constant b0 is equal to the y-intercept of the linear regression.
  • The regression coefficient b1 is the slope of the regression line. Its value is equal to the average change in the dependent variable (Y) for a unit change in the independent variable (X)

61 of 85

Regression Coefficient

The regression coefficient is given by the equation :

Y= B0+B1X

Where

B0 is a constant

B1 is the regression coefficient

Given below is the formula to find the value of the regression coefficient.

∑[(xi-x)(yi-y)]

∑[(xi-x)2]

Where xi and yi are the observed data sets.

And x and y are the mean value.

B1=b1 =

62 of 85

63 of 85

64 of 85

65 of 85

66 of 85

67 of 85

68 of 85

Example

  1. Calculate the regression coefficient and obtain the lines of regression for the following data

  1. Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations from a actual means of X and Y.

  1. Obtain regression equation of Y on X and estimate Y when X=55 from the following

69 of 85

70 of 85

71 of 85

Least Squares method

72 of 85

73 of 85

74 of 85

Multiple Linear Regression

Multiple Linear Regression is one of the important regression algorithms which models the linear relationship between a single dependent continuous variable and more than one independent variable.

Some key points about MLR:

  • For MLR, the dependent or target variable(Y) must be the continuous/real, but the predictor or independent variable may be of continuous or categorical form.
  • Each feature variable must model the linear relationship with the dependent variable.
  • MLR tries to fit a regression line through a multidimensional space of data-points.

75 of 85

MLR equation:

The target variable (Y) in Many Linear Regression is a linear mixture of multiple predictor variables x1, x2, x3,...,xn. The equation becomes: Since it is an upgrade of Simple Linear Regression, the same is done to the multiple linear regression equation.

Y= b0+b1x1+ b2x2+ b3x3+...... bnxn ............... (a)

Where,

Y= Output/Response variable

b0, b1, b2, b3 , bn....= Coefficients of the model.

x1, x2, x3, x4,...= Various Independent / feature variable

76 of 85

Applications of Multiple Linear Regression:

Multiple Linear Regression has primarily two applications:

  • Effectiveness of Independent variable on prediction:
  • Predicting the impact of changes:

77 of 85

78 of 85

79 of 85

80 of 85

81 of 85

82 of 85

83 of 85

84 of 85

85 of 85