300 Years Later, Newton's Method gets an Update
Jeffrey Zhang, Yale BIDS
Joint work with Amir Ali Ahmadi, Abraar Chaudry
AAMU-UAH Joint Seminar, 4/18/2025
Newton’s Method
Newton’s method is an iterative optimization function that minimizes the second order Taylor expansion in each iteration
The Newton iterate
Convergence of Newton’s Method
Theorem |
|
Newton’s method doesn’t always work!
Functions with S-shaped gradients are difficult for Newton’s method
Higher order information in Newton’s method
Using higher order derivative information can improve Newton’s method by generating closer approximations
Empirically, using higher order information can result in faster, more stable algorithms
Why haven’t we done this?
Optimizing high degree polynomials is hard!
Some NP-hard problems:
A bridge to higher-order methods
If there were algorithms to minimize the high degree approximations, then they would immediately yield higher order Newton methods
Example: a third-order Newton method is possible!
Theorem (Silina JZ ’22) |
|
Theorem (Ahmadi, JZ ’22) |
Local minima of cubic polynomials can be found by solving SDPs of size polynomial in the number of variables. |
A different higher order method
Outline
Sums of Squares
Sum of Squares and Polynomial Optimization
sos
Sum of Squares Convexity
Definition |
|
SoS Convex Polynomials and SoS Relaxations
Theorem (Lasserre, ’09) |
|
POP:
1st order Lasserre relaxation:
Outline
Our Newton Algorithm
Main idea: Leverage sos convexity to produce an implementable algorithm
Theorem (Ahmadi, Chaudry, JZ ’22) |
|
To-do checklist
An SoS-convex approximation exists
Lemma |
|
Lemma |
|
A consequence of a more general statement about cones
And it has a unique global minimum
Lemma |
|
Proof (of lemma) sketch:
Lemma |
|
Proof: Using a quadrature rule by Clenshaw and Curtis
Interpretation: When integrating the Hessian of a function along a line in a region where the function is convex, the lowest eigenvalue can be bounded below based on the endpoints of the integration.
Convergence
Lemma |
|
Use a “close enough” to a point with a PD Hessian argument
Convergence
Let’s try to generalize where we can
Convergence
Quadrature again!
Lemma |
|
We can lower bound this using the lemma.
We just got everything we wanted!
Convergence
Convergence
Outline
The univariate setting
Example 1
(strictly convex)
Example 2
(strongly convex)
Example 3
The Beale function (nonconvex)
An addendum: global convergence
Thank you!
Questions?