1 of 29

Logistic Regression

Dr. Dinesh Kumar Vishwakarma

Professor,

Department of Information Technology,

Delhi Technological University, Delhi

2 of 29

Logistic Regression: Intro

  •  

3 of 29

Logistic Regression

4 of 29

Linear vs Logistic

5 of 29

Example

  • Based CGPA of UG, a student will get the admission in PG? Yes/No
  • The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum. Raining or Not.
  • A categorical variable as divides the observations into classes of a stock such as holding /selling / buying, then categorical variable with 3 categories. “hold" class, the “sell" class, and the “buy” class.
  • It can be used for classifying a new observation into one of the classes, based on the values of its predictor variables (called “classification").

6 of 29

Applications

  • Logistic regression is used in applications such as:
    • Classifying customers as returning or non-returning (classification)
    • Finding factors that differentiate between male and female top executives (profiling)
    • Predicting the approval or disapproval of a loan based on information such as credit scores (classification).
  • Popular examples of binary response outcomes are
    • success/failure, yes/no, buy/don't buy, default/don't default, and survive/die.
  • We code the values of a binary response Y as 0 and 1.

7 of 29

Introduction Logistic Regression

  • Most important model for categorical response (yi) data
  • Categorical response with 2 levels (binary: 0 and 1)
  • Categorical response with ≥ 3 levels (nominal or ordinal)
  • Predictor variables (xi) can take on any form: binary, categorical, and/or continuous.

8 of 29

Logistic Curve

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

x

Probability

Sigmoid cure

9 of 29

Logistic Function

10 of 29

Logistic Function

X

P(“Success”|X)

11 of 29

Logit Transformation

  • The logistic regression model is given by

  • which is equivalent to

This is called the �Logit Transformation

12 of 29

Logit Transformation

  • Logistic regression models transform probabilities called logits.

  • where
      • i indexes all cases (observations).
      • pi is the probability the event (a sale, for example) occurs in the ith case.
      • log is the natural log (to the base e).

13 of 29

0

1

LP Model

Logit Model

Comparing LP and Logit Models

14 of 29

Assumption

pi

(pi )

15 of 29

Logistic regression model with a single continuous predictor

  •  

16 of 29

The logistic Regression Model

  • Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x.

The ratio:

is called the odds ratio

This quantity will also increase with the value of x, ranging from zero to infinity.

The quantity:

is called the log odds ratio

17 of 29

Example: odds ratio, log odds ratio

Suppose a die is rolled:

Success = “roll a six”, p = 1/6

The odds ratio

The log odds ratio

18 of 29

The logistic Regression Model

i. e. :

In terms of the odds ratio

Assumes the log odds ratio is linearly related to x.

19 of 29

The logistic Regression Model

or

Solving for p in terms x.

20 of 29

Interpretation of the parameter β0

p

x

  • determines the intercept

21 of 29

Interpretation of the parameter β1

p

x

when

  • determines when p is 0.50 (along with β0)

22 of 29

Also

when

is the rate of increase in p with respect to x when p = 0.50

Interpretation of the parameter β1…

23 of 29

Interpretation of the parameter β1

p

x

determines slope when p is 0.50

24 of 29

Binary Classification

  • In logistic regression we take two steps:
    • First step yields estimates of the probabilities of belonging to each class. In the binary case we get an estimate of P(Y = 1).
    • the probability of belonging to class 1 (which also tells us the probability of belonging to class 0).
  • In the next step we use
    • a cutoff value on these probabilities in order to classify each case to one of the classes.
    • a cutoff of 0.5 means that cases with an estimated probability of P(Y = 1) > 0.5 are classified as belonging to class 1,
    • whereas cases with P(Y = 1) < 0.5 are classified as belonging to class 0.
    • The cutoff need not be set at 0.5.

25 of 29

Types of Logistic Regression

  • Binary Logistic Regression
    • The categorical response has only two 2 possible outcomes. Example: Spam or Not
  • Multinomial Logistic Regression
    • Three or more categories without ordering. Example: Predicting which food is preferred more (Veg, Non-Veg, Vegan)
  • Ordinal Logistic Regression
    • Three or more categories with ordering. Example: Movie rating from 1 to 5

26 of 29

Example

  • Suppose we have the following dataset where the X column represents the feature (input), and Y is the target label (0 or 1) indicating whether a customer will purchase a product:

  • We want to build a logistic regression model to predict the probability that a customer will purchase the product for a new input X = 3.5. The model is given by the following logistic function:

  • The estimated coefficients for the model are:
  • β0=−4 (intercept)
  • β1=1 (coefficient for feature X)

X

Y

1

0

2

0

3

0

4

1

5

1

27 of 29

Solution

  • Calculate the Probability for X=3.5:

  • The predicted probability that the customer will purchase the product when X=3.5 is approximately 0.3775 (or 37.75%), which is less than 0.5, the model would predict that the customer will not purchase the product.

28 of 29

Advantages

  • Interpretable: Coefficients β0​ and β1​ have a clear meaning in terms of odds ratios.
  • Probabilistic Output: Predicts probabilities rather than binary outcomes, allowing for better insight into model confidence.
  • Efficiency: Works well for linearly separable data and is computationally efficient.

29 of 29

Limitations

  • Linearity Assumption: Logistic regression assumes a linear relationship between the log-odds of the response and the input features.
  • Not for Complex Boundaries: Fails when the decision boundary is highly complex or non-linear.
  • Multicollinearity: If features are highly correlated, logistic regression can produce unreliable estimates.