1 of 32

Differentiation of blackbox combinatorial solvers

March 9, 2020

M. Rolínek

Autonomous Learning, MPI-IS Tübingen

2 of 32

BASED ON

[M. Vlastelica, A. Paulus, V. Musil, G. Martius, M. Rolínek, Differentiation of Blackbox Combinatorial Solvers, to appear at ICLR 2020 (spotlight)]

Review scores: 8,8,8

[M. Rolínek, V. Musil, A. Paulus, C. Michaelis, G. Martius, Optimizing Rank-Based Metrics with Blackbox Differentiation, to appear at CVPR 2020]

Review scores: strong accept,strong accept,strong accept

[Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers, under submission]

3 of 32

1.

THE TALE OF TWO WORLDS

4 of 32

Combinatorial optimization

Deep Learning

5 of 32

6 of 32

Standpoint: Fancy combinatorial optimization solvers are ridiculously strong...

...to the point that it compromises the concept of NP-Hardness

7 of 32

Mission: Turn SOTA combinatorial optimization techniques into building blocks of neural networks.

...with minimal compromises! (one hyperparameter)

8 of 32

IN PARTICULAR

Cover many combinatorial problems (shortest path, TSP, graph matching, multicut...)

Fast backward pass (as fast as forward pass)

Theoretically sound

Easy to use and well-performing in practice

9 of 32

WHY BOTHER?

We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities.

[Battaglia et al, Relational Inductive Biases, Deep Learning and Graph Networks]

10 of 32

ARCHITECTURES FOR COMPOSITE PROBLEMS

11 of 32

2.

WHAT’S THE PROBLEM?

12 of 32

WHAT’S A SOLVER (FOR US)

continuous input w

discrete output y

linear cost c = y*w

arbitrary constraints

13 of 32

EXAMPLE SOLVERS

A* (shortest path), MIN-CUT, MIN-COST PERFECT MATCHING, TRAVELING SALESMAN, ...

(many) integer programs, MRFs, CRFs, valued CSPs

14 of 32

THE GRADIENT EXISTS!

Small change of input (edge weights) typically means zero change in output (optimal path).

15 of 32

The function is piecewise constant

This is NOT a gradient estimation problem

16 of 32

IMPORTANT NON-SOLUTIONS

Sample finite differences!

Apply smoothing!

Zero-order methods!

17 of 32

3.

WHAT WE DO?

18 of 32

THE ALGORITHM

Return gradient of function L^λ(w), a continuous “interpolation” of L(w).

Hyperparameter λ > 0 controls the “locality” of the interpolation

19 of 32

2D EXAMPLE

20 of 32

2D EXAMPLE

21 of 32

THE ALGORITHM

22 of 32

BEHIND THE SCENES II

23 of 32

4.

EXPERIMENTS

24 of 32

STAGE I - PROOF OF CONCEPT

Synthetic problems. Combinatorics on raw images.

25 of 32

26 of 32

RESULTS

[Differentiation of Blackbox Combinatorial Solvers, to appear at ICLR 2020]

27 of 32

STAGE II - LIGHTWEIGHT VISION

[Optimizing Rank-Based Metrics with Blackbox Differentiation, may appear at CVPR 2020]

Used for retrieval/detection metrics (mAP, Recall@k)

Blackbox solver = torch.argsort (yay!)

We can directly optimize the metrics out-of-the-box (within detection/retrieval pipelines)

Ranking is “easy” => many methods => we are about on par with SOTA (and faaaast).

28 of 32

STAGE III - MIDDLEWEIGHT VISION

PascalVOC with Berkeley annotations

VGG keypoint features

SplineCONV geometric features

SOTA combinatorial solver for graph matching

[P. Swoboda, C. Rother, H. A. Alhaija, D. Kainmuller, B. Savchynskyy. A Study of Lagrangean Decompositions and Dual Ascent Solvers for Graph Matching. In CVPR 2017.]

[P. Swoboda, D. Kainmueller, A. Mokarian, C. Theobalt and F. Bernard. A convex approach to multi-graph matching. In CVPR 2019.]

29 of 32