2 of 65

Learning Objectives

Use logistic regression for binary classification
Implement logistic regression for binary classification
Address overfitting using regularization, to improve model performance

3 of 65

Linear Regression for Classification

4 of 65

Classification

5 of 65

Classification

6 of 65

Classification

7 of 65

Classification

8 of 65

Classification

9 of 65

Quiz

Which of the following is an example of a classification task?

Decide if an animal is a cat or not a cat.
Estimate the weight of a cat based on its height.

Correct: This is an example of binary classification where there are two possible classes (True/False or Yes/No or 1/0).

10 of 65

Classification with Logistic Regression

11 of 65

Logistic Regression

12 of 65

Logistic Regression

13 of 65

Interpretation of Logistic Regression Output

14 of 65

Quiz

g(z) is near zero
g(z) is near negative one (-1)

15 of 65

Decision Boundary

18 of 65

Non-Linear Decision Boundaries

19 of 65

Non-Linear Decision Boundaries

20 of 65

Quiz

Let’s say you are creating a tumor detection algorithm. Your algorithm will be used to flag potential tumors for future inspection by a specialist. What value should you use for a threshold?

High, say a threshold of 0.9?
Low, say a threshold of 0.2?

Correct: You would not want to miss a potential tumor, so you will want a low threshold. A specialist will review the output of the algorithm which reduces the possibility of a ‘false positive’. The key point of this question is to note that the threshold value does not need to be 0.5.

21 of 65

Quiz

Which is an example of a classification task?

Based on a patient's age and blood pressure, determine how much blood pressure medication (measured in milligrams) the patient should be prescribed.
Based on the size of each tumor, determine if each tumor is malignant (cancerous) or not.
Based on a patient's blood pressure, determine how much blood pressure medication (a dosage measured in milligrams) the patient should be prescribed.

This task predicts one of two classes, malignant or not malignant.

22 of 65

Quiz

g(z) is near negative one (-1)
g(z) is near zero(0)
g(z) is near one (1)
g(z) will be near 0.5

23 of 65

Quiz

Predict it is a cat if g(z) >= 0.5
Predict it is a cat if g(z) < 0.5
Predict it is a cat if g(z) < 0.7
Predict it is a cat if g(z) = 0.5

Think of g(z) as the probability that the photo is of a cat. When this number is at or above the threshold of 0.5, predict that it is a cat.

24 of 65

Quiz

True/False? No matter what features you use (including if you use polynomial features), the decision boundary learned by logistic regression will be a linear decision boundary.

True
False

The decision boundary can also be non-linear, as described in the lectures.

25 of 65

Cost Function for Logistic Regression

26 of 65

Training Set

27 of 65

Squared error cost

28 of 65

Logistic Loss Function

29 of 65

Logistic Loss Function

31 of 65

Quiz

Why is the squared error cost not used in logistic regression?

The non-linear nature of the model results in a “wiggly”, non-convex cost function with many potential local minima.
The mean squared error is used for logistic regression.

If using the mean squared error for logistic regression, the cost function is "non-convex", so it's more difficult for gradient descent to find an optimal value for the parameters w and b.

32 of 65

Simplified loss function

33 of 65

Simplified Cost function

34 of 65

Quiz

The second term of the expression is reduced to zero when the target equals 1.

35 of 65

Quiz

In this lecture series, "cost" and "loss" have distinct meanings. Which one applies to a single training example?

LOSS

COST

BOTH LOSS AND COST

NEITHER LOSS AND COST

In these lectures, loss is calculated on a single training example. It is worth noting that this definition is not universal. Other lecture series may have a different definition.

37 of 65

Gradient Descent for Logistic Regression

38 of 65

Training Logistic Regression

39 of 65

Gradient Descent

40 of 65

Gradient Descent for Logistic Regression

41 of 65

Quiz

Which of the following two statements is a more accurate statement about gradient descent for logistic regression?

42 of 65

Regularization to overfitting

43 of 65

Regression Example

44 of 65

Classification

45 of 65

�Quiz

Our goal when creating a model is to be able to use the model to predict outcomes correctly for new examples. A model which does this is said to generalize well.
When a model fits the training data well but does not work well with new examples that are not in the training set, this is an example of:

Underfitting (high bias)
None of these
A model that generalizes well (neither high variance nor high bias)
Overfitting (high variance)

46 of 65

Addressing Overfitting

47 of 65

Collect more training examples

48 of 65

Select features to include or exclude

49 of 65

Regularization

50 of 65

Quiz

Applying regularization, increasing the number of training examples, or selecting a subset of the most relevant features are methods for…

Addressing underfitting (high bias)
Addressing overfitting (high variance)

These methods can help the model generalize better to new examples that are not in the training set.

53 of 65

Regularization

54 of 65

Regularization

56 of 65

Regularised Linear Regression

57 of 65

Implementing gradient descent

58 of 65

Implementing gradient descent

59 of 65

Implementing gradient descent

61 of 65

Regularised Logistic Regression

62 of 65

�Regularised Logistic Function

63 of 65

Quiz

For regularized logistic regression, how do the gradient descent update steps compare to the steps for linear regression?

64 of 65

Quiz

Which of the following can address overfitting?

Select a subset of the more relevant features.
Remove a random set of training examples
Collect more training data
Apply regularization

65 of 65

Quiz

You fit logistic regression with polynomial features to a dataset, and your model looks like this.

What would you conclude? (Pick one)
The model has high bias (underfit). Thus, adding data is likely to help
The model has high bias (underfit). Thus, adding data is, by itself, unlikely to help much.
The model has high variance (overfit). Thus, adding data is, by itself, unlikely to help much.
The model has high variance (overfit). Thus, adding data is likely to help

1 of 65

2 of 65

3 of 65

4 of 65

5 of 65

6 of 65

7 of 65

8 of 65

9 of 65

10 of 65

11 of 65

12 of 65

13 of 65

14 of 65

15 of 65

16 of 65

17 of 65

18 of 65

19 of 65

20 of 65

21 of 65

22 of 65

23 of 65

24 of 65

25 of 65

26 of 65

27 of 65

28 of 65

29 of 65

30 of 65

31 of 65

32 of 65

33 of 65

34 of 65

35 of 65

36 of 65

37 of 65

38 of 65

39 of 65

40 of 65

41 of 65

42 of 65

43 of 65

44 of 65

45 of 65

46 of 65

47 of 65

48 of 65

49 of 65

50 of 65

51 of 65

52 of 65

53 of 65

54 of 65

55 of 65

56 of 65

57 of 65

58 of 65

59 of 65

60 of 65

61 of 65

62 of 65

63 of 65

64 of 65

65 of 65