1 of 32

300 Years Later, Newton's Method gets an Update

Jeffrey Zhang, Yale BIDS

Joint work with Amir Ali Ahmadi, Abraar Chaudry

AAMU-UAH Joint Seminar, 4/18/2025

2 of 32

3 of 32

Newton’s Method

Newton’s method is an iterative optimization function that minimizes the second order Taylor expansion in each iteration

4 of 32

The Newton iterate

  •  

5 of 32

Convergence of Newton’s Method

  • Classical result in optimization: Newton’s method has quadratic convergence
  • Formally,

Theorem

6 of 32

Newton’s method doesn’t always work!

Functions with S-shaped gradients are difficult for Newton’s method

7 of 32

Higher order information in Newton’s method

Using higher order derivative information can improve Newton’s method by generating closer approximations

Empirically, using higher order information can result in faster, more stable algorithms

8 of 32

Why haven’t we done this?

Optimizing high degree polynomials is hard!

  • Implementing these methods requires optimizing higher degree polynomials

  • Informal theorem: Most optimization problems are NP-hard at degree 4 at the least

Some NP-hard problems:

  • Globally minimizing a polynomial of degree 4
  • Locally minimizing a polynomial of degree 4
  • Finding a second-order point of a polynomial of degree 4
  • Finding a critical point of a polynomial of degree 3

9 of 32

A bridge to higher-order methods

If there were algorithms to minimize the high degree approximations, then they would immediately yield higher order Newton methods

Example: a third-order Newton method is possible!

Theorem (Silina JZ ’22)

Theorem (Ahmadi, JZ ’22)

Local minima of cubic polynomials can be found by solving SDPs of size polynomial in the number of variables.

10 of 32

A different higher order method

  •  

 

11 of 32

Outline

  • An Introduction to Sums of Squares

  • Convergence of Higher Order Newton Methods

  • Numerical Examples

12 of 32

Sums of Squares

  •  

13 of 32

Sum of Squares and Polynomial Optimization

  • Sum of squares relaxations enable finding lower bounds for polynomial optimization problems

 

 

sos

 

  • There are extensions for constrained optimization, but they won’t be needed here

14 of 32

Sum of Squares Convexity

  •  

Definition

15 of 32

SoS Convex Polynomials and SoS Relaxations

  • Key theorem (by Lasserre):

Theorem (Lasserre, ’09)

POP:

1st order Lasserre relaxation:

  • All we need is the unconstrained version

16 of 32

Outline

  • An Introduction to Sums of Squares

  • Convergence of Higher Order Newton Methods

  • Numerical Examples

17 of 32

Our Newton Algorithm

Main idea: Leverage sos convexity to produce an implementable algorithm

 

Theorem (Ahmadi, Chaudry, JZ ’22)

18 of 32

To-do checklist

  •  

19 of 32

An SoS-convex approximation exists

Lemma

 

Lemma

A consequence of a more general statement about cones

20 of 32

And it has a unique global minimum

Lemma

Proof (of lemma) sketch:

  1. Use the lemma to lower bound the polynomial by a coercive quadratic function
  2. This polynomial must then be coercive, so it has a global minimum
  3. If this polynomial has multiple global minima, then it cannot be coercive

Lemma

Proof: Using a quadrature rule by Clenshaw and Curtis

Interpretation: When integrating the Hessian of a function along a line in a region where the function is convex, the lowest eigenvalue can be bounded below based on the endpoints of the integration.

 

21 of 32

Convergence

 

 

Lemma

Use a “close enough” to a point with a PD Hessian argument

 

 

22 of 32

Convergence

Let’s try to generalize where we can

 

 

23 of 32

Convergence

Quadrature again!

 

Lemma

We can lower bound this using the lemma.

 

We just got everything we wanted!

24 of 32

Convergence

  •  

25 of 32

Convergence

  •  

26 of 32

Outline

  • An Introduction to Sums of Squares

  • Convergence of Higher Order Newton Methods

  • Numerical Examples

27 of 32

The univariate setting

  •  

 

 

 

28 of 32

Example 1

 

 

(strictly convex)

 

29 of 32

Example 2

(strongly convex)

 

30 of 32

Example 3

The Beale function (nonconvex)

 

31 of 32

An addendum: global convergence

  • Recall the convex regularized framework, which was globally convergent under certain assumptions:

  • Our algorithm chooses the coefficient on the regularizer based on the solution to an SDP
  • If we had a guess for the relevant Lipschitz constant, we could choose between the higher of the SDP value and the Lipschitz constant
  • Caveat: Slightly different handling when d is even

 

32 of 32

Thank you!

Questions?