Inference, Computation, and Games
Florian Schäfer
RPI, September 2022
1
Overview
Part 1: Competitive Optimization� How to generalize gradient descent to competitive multi-agent optimization?�
Part 2: Cholesky Factors and Probability� Fast solvers for PDE and Gaussian statistics�
Part 3: What’s next?� �� �
2
COMPETITIVE OPTIMIZATION
Part 1
3
Optimization
4
Competitive Optimization
5
The benefits of competition
Multi-agent RL
Constrained and robust optimization
Adversarial learning
6
Alphastar by Google Deepmind
Produced by StyleGan2 on https://thispersondoesnotexist.com/
COMPETITIVE GRADIENT DESCENT
Gradient descent for competitive optimization
7
S, Anandkumar, NeurIPS 2019
Anima Anandkumar
What about Gradient Descent?
8
Revisiting Gradient Descent
9
Revisiting Gradient Descent
10
Revisiting Gradient Descent
11
How to linearize a game?
One agent: Optimal strategy of regularized linear local approximation.
Two agents: Optimal strategies of regularized linear local approximation.
How to linearize a game?
12
Linear for each Player?
13
Linear for each Player?
14
Multilinear!
15
Multilinear!
16
Multilinear!
17
Solving for Nash Equilibrium
18
Competitive Gradient Descent
19
I think they think I think they...
20
I think they think I think they...
21
I think they think I think they...
22
I think they think I think they...
23
I think they think I think they...
24
I think they think I think they...
25
I think they think I think they...
26
Why Bilinear? Why not Newton?
27
Convergence and complexity
28
GENERATIVE ADVERSARIAL�NETWORKS
Application:
29
S, Zheng, Anandkumar, ICML 2020
Hongkai Zheng
Anima Anandkumar
Implicit regularization of CGD
Better IS and FID on CIFAR10
30
1: Implicit competitive regularization in GANs, S, Zheng, and Anandkumar, ICML 2020
DEEP LEARNING WITH (PHYSICS) CONSTRAINTS
Application:
31
Spencer Bryngelson
Qi Zeng
Zeng, Bryngelson, S, 2022
In Physics Informed Learning:
32
1: Physics-informed neural networks
Raissi, Perdikaris, Karniadakis, 2019
In Physics Informed Learning:
33
No “squaring” of the differential operator
Near-single precision accuracy possible
1: Competitive Physics informed Networks, Zeng, Bryngelson, S, 2022
Poisson problem
Cahn Hilliard
Nonlinear Schrödinger
Poisson problem (ablation study)
CONSTRAINED OPTIMIZATION
Application
34
Lagrangian Duality:
35
In Computer Graphics:
36
Constrained Willmore Surfaces
Yousuf Soliman, Albert Chern, Olga Diamanti, Felix Knöppel, Ulrich Pinkall and Peter Schröder,
SIGGRAPH 2021
Competitive Gradient Descent
Augmented Lagrangian
MULTI-AGENT REINFORCEMENT LEARNING
Application:
37
Competitive Policy Gradient:
Prajapat et al.1 use CGD in two-agent RL
Competitive policy gradient theorem for computing Hessian vector products
Competitive policy gradient (CoPG) greatly improves quality of learned policies�
38
1: Competitive Policy Optimization, Prajapat, Azizzadenesheli, Liniger, Yue, Anandkumar, UAI 2021
Markov Soccer:
Discrete State Space: Markov Soccer
� �
39
Simultaneous policy gradient
Competitive policy gradient
CoPG lets agents learn to defend!
� �
Summary
Extend GD to competitive optimization
Bilinear natural for competitive game.
Applied to GANs, graphics, multi-agent RL
40
Further Work
Inequality constraints using dual geometry1�
The “GAN dilemma”2�
Polymatrix CGD for more than 2 players3
41
1: Competitive Mirror Descent, S, Anandkumar, and Owhadi
2: Implicit competitive regularization in GANs, S, Zheng, and Anandkumar, ICML 2020
3: Polymatrix Competitive Gradient Descent, Ma, Letcher, S, Shi, and Anandkumar
PROBABLISTIC VIEW ON CHOLESKY FACTORIZATION
Part 2:
42
Setting for rigorous results
43
Setting for rigorous results
44
Why should we care?
45
What do we need?
46
47
When using standard, dense linear algebra:
48
What’s new?
49
Algorithm I
50
S, Sullivan, Owhadi, SIAM MMS 2021
Tim Sullivan
Houman Owhadi
Our Method
51
Ordering and sparsity pattern
52
Ordering and sparsity pattern
53
Ordering and sparsity pattern
54
Ordering and sparsity pattern
55
Ordering and sparsity pattern
56
Ordering and sparsity pattern
57
Ordering and sparsity pattern
58
Ordering and sparsity pattern
59
Ordering and sparsity pattern
60
Ordering and sparsity pattern
61
Ordering and sparsity pattern
62
Ordering and sparsity pattern
63
Ordering and sparsity pattern
64
Ordering and sparsity pattern
65
Ordering and sparsity pattern
66
Ordering and sparsity pattern
67
Ordering and sparsity pattern
68
Ordering and sparsity pattern
69
Ordering and sparsity pattern
70
Ordering and sparsity pattern
71
Ordering and sparsity pattern
72
Ordering and sparsity pattern
73
Incomplete Cholesky
74
Why does it work?
75
Fade-out instead of Fill-in
Classical:
New:
76
Probabilistic view on Elimination
77
Probabilistic view on Elimination
78
The Screening Effect
“The screening effect is the geostatistical term for the phenomenon of nearby observations tending to reduce the influence of more distant observations when using kriging (optimal linear prediction) for spatial interpolation’’ [Stein, 2012]
Conditioning on nearby points shortens correlation
Prove1 exponential decay using tools234 from numerical homogenization
79
1: S et al. (2020), 2: Målqvist & Peterseim (2014), 3: Owhadi (2015), 4: Owhadi & Scovel (2017)
The Screening Effect
“The screening effect is the geostatistical term for the phenomenon of nearby observations tending to reduce the influence of more distant observations when using kriging (optimal linear prediction) for spatial interpolation’’ [Stein, 2012]
Conditioning on nearby points shortens correlation
Prove1 exponential decay using tools234 from numerical homogenization
80
1: S et al. (2020), 2: Målqvist & Peterseim (2014), 3: Owhadi (2015), 4: Owhadi & Scovel (2017)
Screening and Homogenization
81
Screening and Homogenization
82
Fine scale equation
Coarse grained PDE
Fine Scale
Coarse Scale
Multiscale basis
(Inverse) stiffness matrix in multi scale basis
Screening and Homogenization
83
Fine scale equation
Coarse grained PDE
Fine Scale
Coarse Scale
Multiscale basis
(Inverse) stiffness matrix in multi scale basis
Summary
84
1: S, Sullivan, Owhadi (2021)
RECONSTRUCTION FROM BLACK BOX SOLVERS
Algorithm II
85
S, Owhadi, 2021
Houman Owhadi
86
Our result:
87
Coloring and Peeling
88
1: Fast construction of hierarchical matrix representation from matrix-vector multiplication
Lin, Lu, Ying (2010)
Cholesky reconstruction
89
Algorithm III
90
Chen, S, Huang, and Desbrun, SIGGRAPH 2021
Jiong Chen
Jin Huang
Mathieu Desbrun
91
92
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
93
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
94
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
95
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
96
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
97
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
98
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
99
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
100
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
101
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Owhadi (2017)
102
1: Schröder et al. (1978) 2: Gines et al. (1998) 3: Ho and Ling (2013) 4: Owhadi (2017)
Fast solver by ICHOL(0)
In practice: Use as preconditioner for CG
Use multicolor ordering and supernodes
103
Chen, S., Huang, and Desbrun, 2021, SIGGRAPH 2021
Summary
104
SPARSE CHOLESKY FACTORS BY KL MINIMIZATION
Algorithm IV
105
S, Katzfuss, Owhadi, SIAM Scientific Computing 2021
Houman Owhadi
Matthias Katzfuß
106
Factorization by KL Minimization
107
Factorization by KL Minimization
108
Factorization by KL Minimization
109
1: Vecchia (1988), 2: Kaporin (1990), 3: Kolotilina and Yeremin (1992)], 4: Marzouk et al. (2016)]
Aggregation into Supernodes
110
Aggregation into Supernodes
111
Aggregation into Supernodes
112
Aggregation into Supernodes
113
Limitations of Screening
Density fluctuation: Weakened decay
No boundary values: Weak screening along the boundary
114
Limitations of Screening
115
Experiment: Spatial Statistics
116
Experiment: Aggregation
117
Comparison with HSS Matrices
118
Summary
119
Further Work
KL minimization for solving nonlinear PDEs1�
Screening-based nonlinear transport maps2�
Near-optimal sparsity pattern by maximizing mutual information3
120
1: Sparse Cholesky factorization for solving nonlinear PDEs via Gaussian processes, Chen, Owhadi, S, in preparation
2: Scalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields, Katzfuss, S, 2021
3: Sparse Cholesky factorization by greedy conditional selection, Huan, Guinness, Katzfuss, Owhadi, S, in preparation