1 of 32

Differentiation of blackbox combinatorial solvers

March 9, 2020

M. Rolínek

Autonomous Learning, MPI-IS Tübingen

2 of 32

BASED ON

  • [M. Vlastelica, A. Paulus, V. Musil, G. Martius, M. Rolínek, Differentiation of Blackbox Combinatorial Solvers, to appear at ICLR 2020 (spotlight)]

Review scores: 8,8,8

  • [M. Rolínek, V. Musil, A. Paulus, C. Michaelis, G. Martius, Optimizing Rank-Based Metrics with Blackbox Differentiation, to appear at CVPR 2020]

Review scores: strong accept,strong accept,strong accept

  • [Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers, under submission]

3 of 32

1.

THE TALE OF TWO WORLDS

4 of 32

Combinatorial optimization

Deep Learning

5 of 32

6 of 32

Standpoint: Fancy combinatorial optimization solvers are ridiculously strong...

...to the point that it compromises the concept of NP-Hardness

7 of 32

Mission: Turn SOTA combinatorial optimization techniques into building blocks of neural networks.

...with minimal compromises! (one hyperparameter)

8 of 32

IN PARTICULAR

  • Cover many combinatorial problems (shortest path, TSP, graph matching, multicut...)

  • Fast backward pass (as fast as forward pass)

  • Theoretically sound

  • Easy to use and well-performing in practice

9 of 32

WHY BOTHER?

We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities.

[Battaglia et al, Relational Inductive Biases, Deep Learning and Graph Networks]

10 of 32

ARCHITECTURES FOR COMPOSITE PROBLEMS

11 of 32

2.

WHAT’S THE PROBLEM?

12 of 32

WHAT’S A SOLVER (FOR US)

continuous input w

discrete output y

linear cost c = y*w

arbitrary constraints

13 of 32

EXAMPLE SOLVERS

  • A* (shortest path), MIN-CUT, MIN-COST PERFECT MATCHING, TRAVELING SALESMAN, ...

  • (many) integer programs, MRFs, CRFs, valued CSPs

14 of 32

THE GRADIENT EXISTS!

Small change of input (edge weights) typically means zero change in output (optimal path).

15 of 32

  • The function is piecewise constant

  • This is NOT a gradient estimation problem

16 of 32

IMPORTANT NON-SOLUTIONS

  • Sample finite differences!

  • Apply smoothing!

  • Zero-order methods!

17 of 32

3.

WHAT WE DO?

18 of 32

THE ALGORITHM

Return gradient of function Lλ(w), a continuous “interpolation” of L(w).

Hyperparameter λ > 0 controls the “locality” of the interpolation

19 of 32

2D EXAMPLE

20 of 32

2D EXAMPLE

21 of 32

THE ALGORITHM

22 of 32

BEHIND THE SCENES II

23 of 32

4.

EXPERIMENTS

24 of 32

STAGE I - PROOF OF CONCEPT

Synthetic problems. Combinatorics on raw images.

25 of 32

26 of 32

RESULTS

[Differentiation of Blackbox Combinatorial Solvers, to appear at ICLR 2020]

27 of 32

STAGE II - LIGHTWEIGHT VISION

[Optimizing Rank-Based Metrics with Blackbox Differentiation, may appear at CVPR 2020]

  • Used for retrieval/detection metrics (mAP, Recall@k)

  • Blackbox solver = torch.argsort (yay!)

  • We can directly optimize the metrics out-of-the-box (within detection/retrieval pipelines)

  • Ranking is “easy” => many methods => we are about on par with SOTA (and faaaast).

28 of 32

STAGE III - MIDDLEWEIGHT VISION

  • PascalVOC with Berkeley annotations

  • VGG keypoint features

  • SplineCONV geometric features

  • SOTA combinatorial solver for graph matching

29 of 32

STAGE III - MIDDLEWEIGHT VISION

30 of 32

SUMMARY

  • We can train architectures that contain blackbox combinatorial solvers.

  • Many combinatorial optimization problems are covered.

  • We have a mathematical understanding of the situation.

  • Competitive architectural principle, easy to implement

31 of 32

WORDS OF CAUTION

  • Not everything that seems combinatorial actually is

  • Our framework requires knowing exact specification of the combinatorial problem.

  • Only practical up to medium-sized instances.

  • Even with ideal smoothing, the loss landscape may still be hard to optimize

32 of 32