Gradient Descent and Logistic Regression
CSE 447 / 517
January 16th, 2025 (Week 2)
Logistics
Agenda
Feature Vectors
which “embeds” the input x in d-dimensional space
Gradient Descent
Goal: Given a dataset , find the weights θ* by maximum likelihood estimation.
Gradient Descent
Goal: Given a dataset , find the weights θ* by maximum likelihood estimation.
apply the definition of pLR
Gradient Descent
Big idea: minimize the loss by “optimization along the (negative) gradient”.
See Eisenstein pg. 37
𝞱
loss
“Gradient” aka the 1st derivative is the slope.
Gradient Descent
Step 1: finding the gradient.
Start from the loss function:
Differentiate with respect to the parameters:
Gradient Descent
Step 1: finding the gradient.
Simplify the gradient:
Gradient Descent
Step 2: take a step.
Step 3: repeat Step 1-2 until converge (i.e. loss basically stops decreasing).
Update the parameters:
where ɑ is the learning rate.
Gradient Descent
Things to consider: how to choose learning rate? Another hyperparameter!
From https://www.jeremyjordan.me/nn-learning-rate/
*Stochastic* Gradient Descent
Stochastic Gradient Descent
We can prove that SGD will eventually get very close to a global minimum of a convex objective function. What do you think will happen if we apply SGD to a function that is not convex?
Image source: Wikipedia
Stochastic Gradient Descent
We can prove that SGD will eventually get very close to a global minimum of a convex objective function. What do you think will happen if we apply SGD to a function that is not convex?
SGD tends to lead to local minima, but it makes no guarantees about global minima.
Image source: Wikipedia
Logistic Regression
A logistic regression model usually has:
- A collection of feature functions, denoted
each mapping
- A coefficient or “weight” for every feature, denoted
each
Binary Logistic Regression
the labels are arbitrary and can be changed as long as the classify() function is modified accordingly!
Binary Logistic Regression
from Lecture Slide 40
apply the definition of the score function
Symbol | Definition | Scalar / Vector |
x | Input | Vector |
y | Output | Scalar |
𝞱 | Parameters | Vector |
𝜙(x) | Feature vector (Lecture Slide 31) | Vector |
apply the definition of the standard logistic function
CSE447: Project 0, Python and Pytorch Tutorial + Review
https://colab.research.google.com/drive/1PAUlmIZMcxsKME0UlBCLf8HtQU2rcs5Q