Machine Learning Foundations
Calc II: Partial Derivatives & Integrals
Using Gradients in
Python to Enable Algorithms to
Learn from Data
Jon Krohn, Ph.D.
jonkrohn.com/talks
github.com/jonkrohn/ML-foundations
Machine Learning Foundations
Calc II: Partial Derivatives & Integrals
Slides: jonkrohn.com/talks
Code: github.com/jonkrohn/ML-foundations
Stay in Touch:
jonkrohn.com to sign up for email newsletter
linkedin.com/in/jonkrohn
jonkrohn.com/youtube
twitter.com/JonKrohnLearns
The Pomodoro Technique
Rounds of:
Questions best handled at breaks, so save questions until then.
When people ask questions that have already been answered, do me a favor and let them know, politely providing response if appropriate.
Except during breaks, I recommend attending to this lecture only as topics are not discrete: Later material builds on earlier material.
POLL
What is your level of familiarity with Calculus?
POLL
What is your level of familiarity with Machine Learning?
ML Foundations Series
Calculus II builds upon and is foundational for:
Calc II: Partial Derivatives & Integrals
Calc II: Partial Derivatives & Integrals
Segment 1: Review of Introductory Calc
What Calculus Is
What Calculus Is
What Differential Calculus Is
What Calculus Is
What Integral Calculus Is
The Delta Method
Derivative Notation
Derivative of a Constant
Assuming c is constant:
Intuition: A constant has no variation so its slope is nothing, e.g.:
The Power Rule
The Constant Product Rule
The Sum Rule
The Chain Rule
The Chain Rule
Power Rule on a Function Chain
Fitting a Line with Machine Learning
Machine Learning
Machine Learning
Step 3: Partial Differentiation (the primary focus of Calc II)
Step 4: Descend gradient of cost C w.r.t. parameters m and b
Hands-on code demo: regression-in-pytorch.ipynb
gradient of C w.r.t. p = 0
Image © 2020 Pearson
Calc II: Partial Derivatives & Integrals
Segment 2: ML Gradients
Multivariate Functions
Even in a simple regression such as y = mx + b:
y is a function of multiple parameters
— in this case, m and b.
Therefore, we can’t calculate the full derivative dy/dm or dy/db.
Partial Derivatives
Enable the calculation of derivatives of multivariate equations.
Consider the equation z = x2 - y2
Hands-on demo: geogebra.org/3d
The partial derivative of z with respect to x is obtained by considering y to be a constant:
The slope of z along the x axis is twice the x axis value.
Hands-on code demo
Partial Derivatives
Reconsider z = x2 - y2
from the perspective of z w.r.t y
Hands-on demo: geogebra.org/3d
The partial derivative of z with respect to y is obtained by considering x to be a constant:
The slope of z along the y axis is twice the y axis value
...and is inverted.
Hands-on code demo
Solutions
Solutions
Solutions
Partial Derivatives
Hands-on code demo
Partial Derivatives
Hands-on code demo
Exercises
Find all the partial derivatives of the following functions:
Solutions
Solutions
Solutions
Partial Derivative Notation
The Chain Rule
Let’s say:
Recall that the chain rule for full derivatives would be:
With univariate functions, the partial derivative is identical:
The Chain Rule
With a multivariate function, the partial derivative is more interesting:
The Chain Rule
With multiple multivariate functions, it gets really interesting:
The Chain Rule
Generalizing completely:
Exercises
Find all the partial derivatives of y, where:
Solutions
Solutions
Solutions
Recalling Machine Learning
Recalling Machine Learning
Step 3: Automatic differentiation
Step 4: Descend gradient of cost C w.r.t. parameters m and b
Hands-on code demo: single-point-regression-gradient NB
gradient of C w.r.t. p = 0
Image © 2020 Pearson
Quadratic Cost w.r.t. Predicted y
Predicted y w.r.t. Model Parameters
Quadratic Cost w.r.t. Model Parameters
Hands-on code demo
Recalling Machine Learning
Recalling Machine Learning
Step 3: Determine gradient of cost C w.r.t. parameters m and b
Step 4: Descend gradient
Image © 2020 Pearson
gradient of C w.r.t. p = 0
∇C: the Gradient of Cost
Image © 2020 Pearson
m
b
Hands-on code demo: batch-regression-gradient NB
MSE w.r.t. Predicted y
MSE w.r.t. Model Parameters
Hands-on code demo
Regression Line after 1000 Epochs
Backpropagation
Chain rule of partial derivatives of cost w.r.t. model parameters extends to deep neural networks, which may have 1000s of layers:
Higher-Order Derivatives
Higher-Order Partial Derivatives
In ML, used to accelerate through gradient descent. (Optimization)
Consider the following first-order partial derivatives...
Higher-Order Partial Derivatives
Higher-Order Partial Derivatives
Higher-Order Partial Derivative Notation
Exercise
Find all the second-order partial derivatives of z = x3 + 2xy.
Solution
Calc II: Partial Derivatives & Integrals
Segment 3: Integrals
Supervised Learning
Accuracy at a Single Threshold
The Confusion Matrix
Four Hot Dog Predictions
Image © 2020 Pearson
Receiver-Operating Characteristic
The ROC Curve
Image © 2020 Pearson
The ROC Curve
Integral Calculus
Integral Calculus Applications in ML
Find area under the curve:
Image © 2020 Pearson
dx Slice Width
dx indicates slice width (Δx) is approaching zero width
Integral Notation
The Power Rule
The Constant Multiple Rule
The Sum Rule
Exercises
Solutions
Definite Integrals
Hands-on code demo
Exercise
Evaluate the following expression using both pencil and Python:
Solution
Hands-on code demo
Area Under the ROC Curve
Hands-on code demo
Image © 2020 Pearson
Resources for Further Study
Next Subject: Probability & Info. Thy.
Apply calculus to ascertain how much meaningful signal is present in data.
Learn the probability theory-based foundations of stats and ML.
POLL with Multiple Answers Possible
What follow-up topics interest you most?
Stay in Touch
jonkrohn.com to sign up for email newsletter
linkedin.com/in/jonkrohn
youtube.com/c/JonKrohnLearns
twitter.com/JonKrohnLearns
PLACEHOLDER FOR:
5-Minute Timer
PLACEHOLDER FOR:
10-Minute Timer
PLACEHOLDER FOR:
15-Minute Timer