1 of 25

Finding Task-relevant Features

for Few-shot Learning by Category Traversal

Hongyang Li¹, David Eigen², Samuel Dodge², Matt Zeiler², Xiaogang Wang¹

¹The Chinese University of Hong Kong

²Clarifai Inc.

2 of 25

Introduction

2

Few-shot learning problem

Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.

The feature space is divided into three

Support set

support samples

3 of 25

Introduction

3

Few-shot learning problem

Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.

The feature space is divided into three

Support set

query input

Embedding

4 of 25

Introduction

4

Few-shot learning problem

Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.

The feature space is divided into three

Support set

query input

nearest

belongs to the orange class!

5 of 25

Introduction

5

Few-shot learning problem

Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.

Support set

Query set

as reference to learn the feature embeddings.

as prediction to be compared with the support set (i.e., loss).

This is a 3-way, 4-shot problem.

High N, fewer K -> harder setting

6 of 25

Introduction - poster prep

6

Few-shot learning problem

Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.

Support set

Query set

as reference to learn the feature embeddings.

as prediction to be compared with the support set (i.e., loss).

Embedding

nearest

7 of 25

Introduction

7

Metric-based methods

Matching Network

Vinyals et al. NIPS 2016

Prototypical Network

Snell et al. NIPS 2017

Sample-wise comparison

compute the average of samples within each class (called prototype)

Cluster-wise comparison

Relation Network

Sung et al. CVPR 2018

Learnable relation comparison

relation module

8 of 25

Introduction

8

Metric-based methods - a high-level sum up

Feature extractor

Comparison

Loss

Metric learning

9 of 25

Side: approaches beyond metric-based ones

Optimization-based

The learner samples from a distribution and performs SGD or unrolled weight updates for a few iterations to adapt a parameterized model for the particular task at hand.

MAML https://arxiv.org/pdf/1703.03400.pdf
Reptile https://openai.com/blog/reptile/

10 of 25

Side: approaches beyond metric-based ones

Optimization-based

The learner samples from a distribution and performs SGD or unrolled weight updates for a few iterations to adapt a parameterized model for the particular task at hand.

MAML https://arxiv.org/pdf/1703.03400.pdf
Reptile https://openai.com/blog/reptile/

11 of 25

Side: approaches beyond metric-based ones

Large-training-corpus-based

Given a well-trained model (say ImageNet 1k cls.), how could we quickly adapt to the new classes with few samples without forgetting the old categories?

Dynamic Few-Shot Visual Learning without Forgetting, CVPR 2018

Incremental few-shot learning with attention attractor networks, arXiv

12 of 25

What’s wrong with existing method?

12

Consider a 5-way 1-shot problem where each sample has only two features (color and shape), given a query (green, circle), which class (i - v) does the query belong to?

one sample

13 of 25

What’s wrong with existing method?

13

Consider a 5-way 1-shot problem where each sample has only two features (color and shape), given a query (green, circle), which class (i - v) does the query belong to?

14 of 25

What’s wrong with existing method?

14

Consider a 5-way 1-shot problem where each sample has only two features (color and shape), given a query (green, circle), which class (i - v) does the query belong to?

Problematic:

multiple choices with equal distances!

15 of 25

What’s wrong with existing method?

15

Problematic:

multiple choices with equal distances!

Find relevant features by

looking at all classes (i - v) simultaneously.

shape is shared

color is unique!

16 of 25

What’s wrong with existing method?

16

color is unique!

It belongs to class iii by considering color only.

It motivates us to look for inter-class uniqueness by traversing features of all classes in the support set.

(projector in CTM)

17 of 25

Extending one-shot to K-shot case

17

5 shots for one class

look for commonality within class. In this case, the color dimension is more representative than the shape.

This is widely used in previous methods by averaging samples within one class.

18 of 25

Solution: Category Traversal Module

18

The proposed CTM module is very simple to achieve the aforementioned motivations. It consists of two parts: concentrator and projector.

Suppose the output of the feature extractor is

19 of 25

Solution: Category Traversal Module

19

Concentrator: intra-class commonality

Suppose the output of the feature extractor is

N-way K-shot problem

Aim: to find universal features shared by all instances for one class.
Implementation: a simple CNN layer or a ResNet block.
Proved to be better than the averaging alternative.

20 of 25

Solution: Category Traversal Module

20

Projector: inter-class uniqueness

Suppose the output of the feature extractor is

N-way K-shot problem

Aim: mask out irrelevant features and select the ones most discriminative for the current few-shot task.
Implementation: first concatenate N to channel dim and then apply a CNN layer with small kernel.

reshape

traverse all classes!

Ultimate “mask”

21 of 25

Solution: Category Traversal Module

21

Operations after CTM

Once we have the mask p, the features (both support and query) are modified (element-wise multiplication) before feeding them into the metric learning module.

22 of 25

Category Traversal Module vs Previous Approaches

22

Previous approaches

(metric-based)

Category Traversal Module:

Features more representative

23 of 25

Experiments - as a plug-and-play into existing methods

23

The mean Accuracy (%) of 600 randomly generated episodes for N-way and K-shot settings. In every episode during test, each class has 15 queries.

mini-Imagenet

tiered-Imagenet

24 of 25

Experiments - comparison with SOTAs

24

Optimization based

Large-training-corpus based

Metric based

25 of 25

Finding Task-relevant Features

for Few-shot Learning by Category Traversal

Hongyang Li, David Eigen, Samuel Dodge, Matt Zeiler, Xiaogang Wang

Poster # 1

starts at 10:15 AM

Code --->

https://github.com/Clarifai/few-shot-ctm

yangli@ee.cuhk.edu.hk