1 of 17

Reinforced Feature Points:

Optimizing Feature Detection and Description for a High-Level Task

Aritra Bhowmik, Stefan Gumhold, Carsten Rother, Eric Brachmann

Presented by: Arash Sadeghi Amjadi

1

2 of 17

Introduction

  1. Task: Image matching, a core problem of computer vision.
  2. Practical usage of image matching
  3. History of the task
  4. In this work, a complete vision pipeline was designed. Particularly solving the task of relative pose estimation.

2

3 of 17

Introduction

  • Contribution
    1. A new training methodology
    2. Applying the proposed method to a state-of-the-art architecture
    3. After training, SuperPoint reaches, and slightly exceeds, the accuracy of SIFT

3

4 of 17

Method

  • Task: estimation of relative transformation, between two images, I and I’
  • Keypoints xi of image I is found by “detection network”
  • The description of keypoint xi , d(xi ; w) is done by “description network”
  • Both Networks use joint architecture, SuperPoint, with most weights, w, shared between these to networks.

4

5 of 17

Method

  • Main goal: optimizing learnable parameter, w, for enhancing accuracy.
  • Problem: keypoint selection and feature matching are discrete and non-differentiable operations. Components of vision pipeline might also be non-differentiable.
  • Solution: thanks to reinforcement learning, formulating feature detection and matching as probabilistic actions.

5

6 of 17

Method

6

7 of 17

Method

Part 1: Probabilistic Key Point Selection

  1. Reformulating keypoint heatmap as a probability distribution over key point locations parameterized by w.
  2. Sampling N key points independently

  • Joint probability of sampling key points independently in each image

7

8 of 17

Method

Part 2: Probabilistic Feature Matching

  1. Probability of match between two key points with descriptors denoted as

  • Complete set of M matches between I and I’ sampled independently

8

9 of 17

Method

Part 3: Learning Objective

  1. Data is in form of (I,I’,T*) with ground truth of T*
  2. Loss value l(M, X , X’ ), scalar, depends on the key points X and X’, and the matches M that was selected among the key points.
  3. Loss to be minimized:

  • Problem: calculating the expectation is infeasible. Solution: using a initialized and pre-trained network like SuperPoint. For such a network:
    1. Heatmap predicted by detector is sparse. Only few image locations have an impact.
    2. Matches among unrelated key points have a large descriptor distance and no impact on the expectation.

9

10 of 17

Method

Part 3: Learning Objective

  1. Updating learnable parameter following classic REINFORCE algorithm:

10

11 of 17

Experiments

Part1: Relative pose estimation

  1. Network architecture
  2. Task Description
  3. Datasets
  4. Training
  5. Test

11

12 of 17

Experiments

Part1: Relative pose estimation

12

13 of 17

Experiments

Part1: Relative pose estimation

13

14 of 17

Experiments

Part2: Low-Level Matching Accuracy

14

15 of 17

Experiments

Part2: Low-Level Matching Accuracy

15

16 of 17

Experiments

Part3: Structure-from-Motion

16

17 of 17

Thank you for your attention!

17