1 of 120

Inference, Computation, and Games

Florian Schäfer

RPI, September 2022

1

2 of 120

Overview

Part 1: Competitive Optimization� How to generalize gradient descent to competitive multi-agent optimization?�

Part 2: Cholesky Factors and Probability� Fast solvers for PDE and Gaussian statistics�

Part 3: What’s next? �� �

2

3 of 120

COMPETITIVE OPTIMIZATION

Part 1

3

4 of 120

Optimization

  •  

4

5 of 120

Competitive Optimization

  •  

5

6 of 120

The benefits of competition

Multi-agent RL

Constrained and robust optimization

Adversarial learning

6

Alphastar by Google Deepmind

Produced by StyleGan2 on https://thispersondoesnotexist.com/

7 of 120

COMPETITIVE GRADIENT DESCENT

Gradient descent for competitive optimization

7

S, Anandkumar, NeurIPS 2019

Anima Anandkumar

8 of 120

What about Gradient Descent?

  •  

8

 

9 of 120

Revisiting Gradient Descent

  •  

9

 

 

10 of 120

Revisiting Gradient Descent

  •  

10

 

 

11 of 120

Revisiting Gradient Descent

  •  

11

 

 

12 of 120

How to linearize a game?

One agent: Optimal strategy of regularized linear local approximation.

Two agents: Optimal strategies of regularized linear local approximation.

How to linearize a game?

12

13 of 120

Linear for each Player?

  •  

13

 

14 of 120

Linear for each Player?

  •  

14

 

15 of 120

Multilinear!

  •  

15

 

16 of 120

Multilinear!

  •  

16

 

17 of 120

Multilinear!

  •  

17

 

18 of 120

Solving for Nash Equilibrium

  •  

18

19 of 120

Competitive Gradient Descent

  •  

19

20 of 120

I think they think I think they...

  •  

20

21 of 120

I think they think I think they...

  •  

21

22 of 120

I think they think I think they...

  •  

22

23 of 120

I think they think I think they...

  •  

23

24 of 120

I think they think I think they...

  •  

24

25 of 120

I think they think I think they...

  •  

25

26 of 120

I think they think I think they...

  •  

26

27 of 120

Why Bilinear? Why not Newton?

  •  

27

28 of 120

Convergence and complexity

  •  

28

29 of 120

GENERATIVE ADVERSARIAL�NETWORKS

Application:

29

S, Zheng, Anandkumar, ICML 2020

Hongkai Zheng

Anima Anandkumar

30 of 120

Implicit regularization of CGD

  1. Use WGAN-GP model.
  2. Remove gradient penalty.
  3. Train with CGD.

Better IS and FID on CIFAR10

30

1: Implicit competitive regularization in GANs, S, Zheng, and Anandkumar, ICML 2020

31 of 120

DEEP LEARNING WITH (PHYSICS) CONSTRAINTS

Application:

31

Spencer Bryngelson

Qi Zeng

Zeng, Bryngelson, S, 2022

32 of 120

In Physics Informed Learning:

32

 

1: Physics-informed neural networks

Raissi, Perdikaris, Karniadakis, 2019

33 of 120

In Physics Informed Learning:

33

No “squaring” of the differential operator

Near-single precision accuracy possible

1: Competitive Physics informed Networks, Zeng, Bryngelson, S, 2022

Poisson problem

Cahn Hilliard

Nonlinear Schrödinger

Poisson problem (ablation study)

34 of 120

CONSTRAINED OPTIMIZATION

Application

34

35 of 120

Lagrangian Duality:

  •  

35

36 of 120

In Computer Graphics:

36

Constrained Willmore Surfaces

Yousuf Soliman, Albert Chern, Olga Diamanti, Felix Knöppel, Ulrich Pinkall and Peter Schröder,

SIGGRAPH 2021

Competitive Gradient Descent

Augmented Lagrangian

 

 

37 of 120

MULTI-AGENT REINFORCEMENT LEARNING

Application:

37

38 of 120

Competitive Policy Gradient:

Prajapat et al.1 use CGD in two-agent RL

Competitive policy gradient theorem for computing Hessian vector products

Competitive policy gradient (CoPG) greatly improves quality of learned policies�

38

1: Competitive Policy Optimization, Prajapat, Azizzadenesheli, Liniger, Yue, Anandkumar, UAI 2021

39 of 120

Markov Soccer:

Discrete State Space: Markov Soccer

39

Simultaneous policy gradient

Competitive policy gradient

CoPG lets agents learn to defend!

40 of 120

Summary

Extend GD to competitive optimization

Bilinear natural for competitive game.

Applied to GANs, graphics, multi-agent RL

40

41 of 120

Further Work

Inequality constraints using dual geometry1

The “GAN dilemma”2

Polymatrix CGD for more than 2 players3

41

1: Competitive Mirror Descent, S, Anandkumar, and Owhadi

2: Implicit competitive regularization in GANs, S, Zheng, and Anandkumar, ICML 2020

3: Polymatrix Competitive Gradient Descent, Ma, Letcher, S, Shi, and Anandkumar

42 of 120

PROBABLISTIC VIEW ON CHOLESKY FACTORIZATION

Part 2:

42

43 of 120

Setting for rigorous results

  •  

43

 

44 of 120

Setting for rigorous results

  •  

44

45 of 120

Why should we care?

  •  

45

 

 

46 of 120

What do we need?

  •  

46

47 of 120

 

  •  

47

When using standard, dense linear algebra:

48 of 120

 

  •  

48

49 of 120

What’s new?

  •  

49

50 of 120

 

Algorithm I

50

S, Sullivan, Owhadi, SIAM MMS 2021

Tim Sullivan

Houman Owhadi

51 of 120

Our Method

  •  

51

 

 

 

 

 

 

 

 

52 of 120

Ordering and sparsity pattern

  •  

52

53 of 120

Ordering and sparsity pattern

  •  

53

54 of 120

Ordering and sparsity pattern

  •  

54

55 of 120

Ordering and sparsity pattern

  •  

55

56 of 120

Ordering and sparsity pattern

  •  

56

57 of 120

Ordering and sparsity pattern

  •  

57

58 of 120

Ordering and sparsity pattern

  •  

58

59 of 120

Ordering and sparsity pattern

  •  

59

60 of 120

Ordering and sparsity pattern

  •  

60

61 of 120

Ordering and sparsity pattern

  •  

61

62 of 120

Ordering and sparsity pattern

  •  

62

63 of 120

Ordering and sparsity pattern

  •  

63

64 of 120

Ordering and sparsity pattern

  •  

64

65 of 120

Ordering and sparsity pattern

  •  

65

66 of 120

Ordering and sparsity pattern

  •  

66

67 of 120

Ordering and sparsity pattern

  •  

67

68 of 120

Ordering and sparsity pattern

  •  

68

69 of 120

Ordering and sparsity pattern

  •  

69

70 of 120

Ordering and sparsity pattern

  •  

70

71 of 120

Ordering and sparsity pattern

  •  

71

72 of 120

Ordering and sparsity pattern

  •  

72

73 of 120

Ordering and sparsity pattern

  •  

73

74 of 120

Incomplete Cholesky

74

 

 

 

 

 

75 of 120

Why does it work?

  •  

75

76 of 120

Fade-out instead of Fill-in

Classical:

  • Sparse fact. of sparse matrices
  • Dense factors by fill-in

New:

  • Sparse fact. of dense matrices
  • Sparse factors by fade-out

76

77 of 120

Probabilistic view on Elimination

  •  

77

 

78 of 120

Probabilistic view on Elimination

  •  

78

 

 

79 of 120

The Screening Effect

“The screening effect is the geostatistical term for the phenomenon of nearby observations tending to reduce the influence of more distant observations when using kriging (optimal linear prediction) for spatial interpolation’’ [Stein, 2012]

Conditioning on nearby points shortens correlation

Prove1 exponential decay using tools234 from numerical homogenization

79

1: S et al. (2020), 2: Målqvist & Peterseim (2014), 3: Owhadi (2015), 4: Owhadi & Scovel (2017)

80 of 120

The Screening Effect

“The screening effect is the geostatistical term for the phenomenon of nearby observations tending to reduce the influence of more distant observations when using kriging (optimal linear prediction) for spatial interpolation’’ [Stein, 2012]

Conditioning on nearby points shortens correlation

Prove1 exponential decay using tools234 from numerical homogenization

80

1: S et al. (2020), 2: Målqvist & Peterseim (2014), 3: Owhadi (2015), 4: Owhadi & Scovel (2017)

81 of 120

Screening and Homogenization

  •  

81

82 of 120

Screening and Homogenization

  •  

82

Fine scale equation

Coarse grained PDE

Fine Scale

Coarse Scale

Multiscale basis

(Inverse) stiffness matrix in multi scale basis

83 of 120

Screening and Homogenization

  •  

83

Fine scale equation

Coarse grained PDE

Fine Scale

Coarse Scale

Multiscale basis

(Inverse) stiffness matrix in multi scale basis

84 of 120

Summary

  •  

84

1: S, Sullivan, Owhadi (2021)

85 of 120

RECONSTRUCTION FROM BLACK BOX SOLVERS

Algorithm II

85

S, Owhadi, 2021

Houman Owhadi

86 of 120

 

  •  

86

87 of 120

Our result:

  •  

87

88 of 120

Coloring and Peeling

  •  

88

1: Fast construction of hierarchical matrix representation from matrix-vector multiplication

Lin, Lu, Ying (2010)

89 of 120

Cholesky reconstruction

  •  

89

90 of 120

 

Algorithm III

90

Chen, S, Huang, and Desbrun, SIGGRAPH 2021

Jiong Chen

Jin Huang

Mathieu Desbrun

91 of 120

 

  •  

91

92 of 120

 

  •  

92

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

93 of 120

 

  •  

93

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

94 of 120

 

  •  

94

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

95 of 120

 

  •  

95

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

96 of 120

 

  •  

96

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

97 of 120

 

  •  

97

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

98 of 120

 

  •  

98

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

99 of 120

 

  •  

99

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

100 of 120

 

  •  

100

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

101 of 120

 

  •  

101

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)

102 of 120

 

  •  

102

1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Ho and Ling (2013) 4: Owhadi (2017)

103 of 120

Fast solver by ICHOL(0)

In practice: Use as preconditioner for CG

Use multicolor ordering and supernodes

103

Chen, S., Huang, and Desbrun, 2021, SIGGRAPH 2021

104 of 120

Summary

  • Fast solver by incomplete Cholesky

  • Best asymptotic results in the literature for inverse of general elliptic Green’s matrices
  • Competes with established libraries for AMG such as Trilinos and AMGCL

104

105 of 120

SPARSE CHOLESKY FACTORS BY KL MINIMIZATION

Algorithm IV

105

S, Katzfuss, Owhadi, SIAM Scientific Computing 2021

Houman Owhadi

Matthias Katzfuß

106 of 120

 

  •  

106

107 of 120

Factorization by KL Minimization

  •  

107

 

108 of 120

Factorization by KL Minimization

  •  

108

 

 

 

 

109 of 120

Factorization by KL Minimization

  •  

109

 

1: Vecchia (1988), 2: Kaporin (1990), 3: Kolotilina and Yeremin (1992)], 4: Marzouk et al. (2016)]

110 of 120

Aggregation into Supernodes

  •  

110

 

 

111 of 120

Aggregation into Supernodes

  •  

111

 

 

112 of 120

Aggregation into Supernodes

  •  

112

 

 

113 of 120

Aggregation into Supernodes

  •  

113

 

 

 

114 of 120

Limitations of Screening

Density fluctuation: Weakened decay

No boundary values: Weak screening along the boundary

114

115 of 120

Limitations of Screening

  •  

115

 

 

 

 

 

 

 

116 of 120

Experiment: Spatial Statistics

  •  

116

117 of 120

Experiment: Aggregation

  •  

117

118 of 120

Comparison with HSS Matrices

  •  

118

119 of 120

Summary

  •  

119

120 of 120

Further Work

KL minimization for solving nonlinear PDEs1

Screening-based nonlinear transport maps2

Near-optimal sparsity pattern by maximizing mutual information3

120

1: Sparse Cholesky factorization for solving nonlinear PDEs via Gaussian processes, Chen, Owhadi, S, in preparation

2: Scalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields, Katzfuss, S, 2021

3: Sparse Cholesky factorization by greedy conditional selection, Huan, Guinness, Katzfuss, Owhadi, S, in preparation