Finding Task-relevant Features
for Few-shot Learning by Category Traversal
Hongyang Li1, David Eigen2, Samuel Dodge2, Matt Zeiler2, Xiaogang Wang1
1 The Chinese University of Hong Kong
2 Clarifai Inc.
Introduction
2
Few-shot learning problem
Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.
The feature space is divided into three
Support set
support samples
Introduction
3
Few-shot learning problem
Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.
The feature space is divided into three
Support set
query input
Embedding
Introduction
4
Few-shot learning problem
Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.
The feature space is divided into three
Support set
query input
nearest
belongs to the orange class!
Introduction
5
Few-shot learning problem
Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.
Support set
Query set
as reference to learn the feature embeddings.
as prediction to be compared with the support set (i.e., loss).
This is a 3-way, 4-shot problem.
High N, fewer K -> harder setting
Introduction - poster prep
6
Few-shot learning problem
Given only a few samples (K) for each class in the support set, we separate the feature space into N clusters/classes, by comparing the feature distance between the query input and the support samples.
Support set
Query set
as reference to learn the feature embeddings.
as prediction to be compared with the support set (i.e., loss).
Embedding
nearest
Introduction
7
Metric-based methods
Matching Network
Vinyals et al. NIPS 2016
Prototypical Network
Snell et al. NIPS 2017
Sample-wise comparison
compute the average of samples within each class (called prototype)
Cluster-wise comparison
Relation Network
Sung et al. CVPR 2018
Learnable relation comparison
relation module
Introduction
8
Metric-based methods - a high-level sum up
Feature extractor
Comparison
Loss
Metric learning
Side: approaches beyond metric-based ones
Optimization-based
The learner samples from a distribution and performs SGD or unrolled weight updates for a few iterations to adapt a parameterized model for the particular task at hand.
Side: approaches beyond metric-based ones
Optimization-based
The learner samples from a distribution and performs SGD or unrolled weight updates for a few iterations to adapt a parameterized model for the particular task at hand.
Side: approaches beyond metric-based ones
Large-training-corpus-based
Given a well-trained model (say ImageNet 1k cls.), how could we quickly adapt to the new classes with few samples without forgetting the old categories?
What’s wrong with existing method?
12
Consider a 5-way 1-shot problem where each sample has only two features (color and shape), given a query (green, circle), which class (i - v) does the query belong to?
one sample
What’s wrong with existing method?
13
Consider a 5-way 1-shot problem where each sample has only two features (color and shape), given a query (green, circle), which class (i - v) does the query belong to?
What’s wrong with existing method?
14
Consider a 5-way 1-shot problem where each sample has only two features (color and shape), given a query (green, circle), which class (i - v) does the query belong to?
Problematic:
multiple choices with equal distances!
What’s wrong with existing method?
15
Problematic:
multiple choices with equal distances!
Find relevant features by
looking at all classes (i - v) simultaneously.
shape is shared
color is unique!
What’s wrong with existing method?
16
color is unique!
It belongs to class iii by considering color only.
It motivates us to look for inter-class uniqueness by traversing features of all classes in the support set.
(projector in CTM)
Extending one-shot to K-shot case
17
5 shots for one class
look for commonality within class. In this case, the color dimension is more representative than the shape.
This is widely used in previous methods by averaging samples within one class.
Solution: Category Traversal Module
18
The proposed CTM module is very simple to achieve the aforementioned motivations. It consists of two parts: concentrator and projector.
Suppose the output of the feature extractor is
Solution: Category Traversal Module
19
Concentrator: intra-class commonality
Suppose the output of the feature extractor is
N-way K-shot problem
Solution: Category Traversal Module
20
Projector: inter-class uniqueness
Suppose the output of the feature extractor is
N-way K-shot problem
reshape
traverse all classes!
Ultimate “mask”
Solution: Category Traversal Module
21
Operations after CTM
Once we have the mask p, the features (both support and query) are modified (element-wise multiplication) before feeding them into the metric learning module.
Category Traversal Module vs Previous Approaches
22
Previous approaches
(metric-based)
Category Traversal Module:
Features more representative
Experiments - as a plug-and-play into existing methods
23
The mean Accuracy (%) of 600 randomly generated episodes for N-way and K-shot settings. In every episode during test, each class has 15 queries.
mini-Imagenet
tiered-Imagenet
Experiments - comparison with SOTAs
24
Optimization based
Large-training-corpus based
Metric based
Finding Task-relevant Features
for Few-shot Learning by Category Traversal
Hongyang Li, David Eigen, Samuel Dodge, Matt Zeiler, Xiaogang Wang
Poster # 1
starts at 10:15 AM
Code --->
yangli@ee.cuhk.edu.hk