1 of 120

Inference, Computation, and Games

Florian Schäfer

RPI, September 2022

1

2 of 120

Overview

Part 1: Competitive Optimization� How to generalize gradient descent to competitive multi-agent optimization?�

Part 2: Cholesky Factors and Probability� Fast solvers for PDE and Gaussian statistics�

Part 3: What’s next?� ��

2

3 of 120

COMPETITIVE OPTIMIZATION

Part 1

3

4 of 120

Optimization

4

5 of 120

Competitive Optimization

5

6 of 120

The benefits of competition

Multi-agent RL

Constrained and robust optimization

Adversarial learning

6

Alphastar by Google Deepmind

Produced by StyleGan2 on https://thispersondoesnotexist.com/

7 of 120

COMPETITIVE GRADIENT DESCENT

Gradient descent for competitive optimization

7

S, Anandkumar, NeurIPS 2019

Anima Anandkumar

8 of 120

What about Gradient Descent?

8

9 of 120

Revisiting Gradient Descent

9

10 of 120

Revisiting Gradient Descent

10

11 of 120

Revisiting Gradient Descent

11

12 of 120

How to linearize a game?

One agent: Optimal strategy of regularized linear local approximation.

Two agents: Optimal strategies of regularized linear local approximation.

How to linearize a game?

12

13 of 120

Linear for each Player?

13

14 of 120

Linear for each Player?

14

15 of 120

Multilinear!

15

16 of 120

Multilinear!

16

17 of 120

Multilinear!

17

18 of 120

Solving for Nash Equilibrium

18

19 of 120

Competitive Gradient Descent

19

20 of 120

I think they think I think they...

20

21 of 120

I think they think I think they...

21

22 of 120

I think they think I think they...

22

23 of 120

I think they think I think they...

23

24 of 120

I think they think I think they...

24

25 of 120

I think they think I think they...

25

26 of 120

I think they think I think they...

26

27 of 120

Why Bilinear? Why not Newton?

27

28 of 120

Convergence and complexity

28

29 of 120

GENERATIVE ADVERSARIAL�NETWORKS

Application:

29

S, Zheng, Anandkumar, ICML 2020

Hongkai Zheng

Anima Anandkumar

30 of 120

Implicit regularization of CGD

Use WGAN-GP model.
Remove gradient penalty.
Train with CGD.

Better IS and FID on CIFAR10

30

1: Implicit competitive regularization in GANs, S, Zheng, and Anandkumar, ICML 2020

31 of 120

DEEP LEARNING WITH (PHYSICS) CONSTRAINTS

Application:

31

Spencer Bryngelson

Qi Zeng

Zeng, Bryngelson, S, 2022

32 of 120

In Physics Informed Learning:

32

1: Physics-informed neural networks

Raissi, Perdikaris, Karniadakis, 2019

33 of 120

In Physics Informed Learning:

33

No “squaring” of the differential operator

Near-single precision accuracy possible

1: Competitive Physics informed Networks, Zeng, Bryngelson, S, 2022

Poisson problem

Cahn Hilliard

Nonlinear Schrödinger

Poisson problem (ablation study)

34 of 120

CONSTRAINED OPTIMIZATION

Application

34

35 of 120

Lagrangian Duality:

35

36 of 120

In Computer Graphics:

36

Constrained Willmore Surfaces

Yousuf Soliman, Albert Chern, Olga Diamanti, Felix Knöppel, Ulrich Pinkall and Peter Schröder,

SIGGRAPH 2021

Competitive Gradient Descent

Augmented Lagrangian

37 of 120

MULTI-AGENT REINFORCEMENT LEARNING

Application:

37

38 of 120

Competitive Policy Gradient:

Prajapat et al.¹ use CGD in two-agent RL

Competitive policy gradient theorem for computing Hessian vector products

Competitive policy gradient (CoPG) greatly improves quality of learned policies�

38

1: Competitive Policy Optimization, Prajapat, Azizzadenesheli, Liniger, Yue, Anandkumar, UAI 2021

39 of 120

Markov Soccer:

Discrete State Space: Markov Soccer

� �

39

Simultaneous policy gradient

Competitive policy gradient

CoPG lets agents learn to defend!

� �

40 of 120

Summary

Extend GD to competitive optimization

Bilinear natural for competitive game.

Applied to GANs, graphics, multi-agent RL

40

41 of 120

Further Work

Inequality constraints using dual geometry¹�

The “GAN dilemma”²�

Polymatrix CGD for more than 2 players³

41

1: Competitive Mirror Descent, S, Anandkumar, and Owhadi

2: Implicit competitive regularization in GANs, S, Zheng, and Anandkumar, ICML 2020

3: Polymatrix Competitive Gradient Descent, Ma, Letcher, S, Shi, and Anandkumar

42 of 120

PROBABLISTIC VIEW ON CHOLESKY FACTORIZATION

Part 2:

42

43 of 120

Setting for rigorous results

43

44 of 120

Setting for rigorous results

44

45 of 120

Why should we care?

45

46 of 120

What do we need?

46

47 of 120

47

When using standard, dense linear algebra:

48 of 120

48

49 of 120

What’s new?

49

50 of 120

Algorithm I

50

S, Sullivan, Owhadi, SIAM MMS 2021

Tim Sullivan

Houman Owhadi

51 of 120

Our Method

51

52 of 120

Ordering and sparsity pattern

52

53 of 120

Ordering and sparsity pattern

53

54 of 120

Ordering and sparsity pattern

54

55 of 120

Ordering and sparsity pattern

55

56 of 120

Ordering and sparsity pattern

56

57 of 120

Ordering and sparsity pattern

57

58 of 120

Ordering and sparsity pattern

58

59 of 120

Ordering and sparsity pattern

59

60 of 120

Ordering and sparsity pattern

60

61 of 120

Ordering and sparsity pattern

61

62 of 120

Ordering and sparsity pattern

62

63 of 120

Ordering and sparsity pattern

63

64 of 120

Ordering and sparsity pattern

64

65 of 120

Ordering and sparsity pattern

65

66 of 120

Ordering and sparsity pattern

66

67 of 120

Ordering and sparsity pattern

67

68 of 120

Ordering and sparsity pattern

68

69 of 120

Ordering and sparsity pattern

69

70 of 120

Ordering and sparsity pattern

70

71 of 120

Ordering and sparsity pattern

71

72 of 120

Ordering and sparsity pattern

72

73 of 120

Ordering and sparsity pattern

73

74 of 120

Incomplete Cholesky

74

75 of 120

Why does it work?

75

76 of 120

Fade-out instead of Fill-in

Classical:

Sparse fact. of sparse matrices
Dense factors by fill-in

New:

Sparse fact. of dense matrices
Sparse factors by fade-out

76

77 of 120

Probabilistic view on Elimination

77

78 of 120

Probabilistic view on Elimination

78

79 of 120

The Screening Effect

“The screening eﬀect is the geostatistical term for the phenomenon of nearby observations tending to reduce the inﬂuence of more distant observations when using kriging (optimal linear prediction) for spatial interpolation’’ [Stein, 2012]

Conditioning on nearby points shortens correlation

Prove¹ exponential decay using tools²³⁴from numerical homogenization

79

1: S et al. (2020), 2: Målqvist & Peterseim (2014), 3: Owhadi (2015), 4: Owhadi & Scovel (2017)

80 of 120

The Screening Effect

“The screening eﬀect is the geostatistical term for the phenomenon of nearby observations tending to reduce the inﬂuence of more distant observations when using kriging (optimal linear prediction) for spatial interpolation’’ [Stein, 2012]

Conditioning on nearby points shortens correlation

Prove¹ exponential decay using tools²³⁴from numerical homogenization

80

1: S et al. (2020), 2: Målqvist & Peterseim (2014), 3: Owhadi (2015), 4: Owhadi & Scovel (2017)

81 of 120

Screening and Homogenization

81

82 of 120

Screening and Homogenization

82

Fine scale equation

Coarse grained PDE

Fine Scale

Coarse Scale

Multiscale basis

(Inverse) stiffness matrix in multi scale basis

83 of 120

Screening and Homogenization

83

Fine scale equation

Coarse grained PDE

Fine Scale

Coarse Scale

Multiscale basis

(Inverse) stiffness matrix in multi scale basis

84 of 120

Summary

84

1: S, Sullivan, Owhadi (2021)

85 of 120

RECONSTRUCTION FROM BLACK BOX SOLVERS

Algorithm II

85

S, Owhadi, 2021

Houman Owhadi

86 of 120

86

87 of 120

Our result:

87

88 of 120

Coloring and Peeling

88

1: Fast construction of hierarchical matrix representation from matrix-vector multiplication

Lin, Lu, Ying (2010)

89 of 120

Cholesky reconstruction

89

90 of 120

Algorithm III

90

Chen, S, Huang, and Desbrun, SIGGRAPH 2021

Jiong Chen

Jin Huang

Mathieu Desbrun

91 of 120

91

92 of 120

92