ZeroC: A Neuro-Symbolic Model for
Zero-shot Concept Recognition and Acquisition at Inference Time
1
NeurIPS 2022
Tailin Wu1, Megan Tjandrasuwita2, Zhengxuan Wu1, Xuelin Yang1, Kevin Liu1, Rok Sosic1, Jure Leskovec1
1 Stanford University
2 MIT
Motivation
Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner
2
Suppose we humans have only learned the concept of “line” and relation of “parallel” and “perpendicular”:
“Line”
“Parallel”
Prior knowledge:
“Perpendicular”
(concept)
(relation)
(relation)
Given: Symbolic structure of a new concept
E.g.. when told a “rectangle” consists of two pairs of “lines”, the lines within the pairs are “parallel,” and the lines between the pairs are “perpendicular”
Zero-shot recognition:
“rectangle”
Zero-shot recognize novel (hierarchical) concepts:
Motivation
Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner
3
Zero-shot acquire novel (hierarchical) concepts:
Suppose we humans have only learned the concept of “line” and relation of “parallel” and “perpendicular”:
“Line”
“Parallel”
Prior knowledge:
“Perpendicular”
(concept)
(relation)
(relation)
Zero-shot acquire: Symbolic structure of a new concept
A “rectangle” consists of two pairs of “lines”, the lines within the pairs are “parallel,” and the lines between the pairs are “perpendicular”
Given a single demonstration:
“rectangle”
4
Problem definition and significance:
How can we endow machine learning (ML) models with the capability of zero-shot recognition and acquisition of hierarchical visual concepts?
Having such capability will allow ML models to tackle more complex tasks at inference time, without further training on those specific tasks.
Why is it hard:
Because machine learning models typically generalize to examples drawn from same/similar distribution as in training. Here we would like the model to generalize to more complex, hierarchical concepts, not seen previously.
Prior methods:
Only address part of the problem:
[1] Du et al. NeurIPS 2020
[2] Higgins et al. ICLR 2018
[3] Andreas et al. CVPR 2016
[4] Snell, NeurIPS 2017
[5] Mao et al. ICLR 2019
[6] Kipf et al. ICLR 2018
[7] Shanahan et al. ICML 2020
[8] Romera et al. ICML 2015
[9] Bucher et al. ICCV 2017
[10] Schonfeld et al. CVPR 2019
Our contribution:
In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC) to address this problem.
5
ZeroC represents concepts as graphs of constituent concept models (as nodes) and their relations (as edges). It allows a one-to-one mapping between a symbolic graph structure of a concept and its corresponding recognition model.
It (for the first time) allows acquiring new concepts, communicating its graph structure, and applying it to classification and detection tasks (even across domains) at inference time.
Illustration of concept:
6
observed
mental representation
image of a line
mask of the line
“line”
concept
observation
mask
concept name
concept graph:
concept probability model
“line” (elementary concept)
Illustration of concept:
7
concept
observation
mask
concept name
concept graph:
concept probability model
observed
mental representation
image of a parallel-line
mask of the parallel-line
“parallel-line”
“line” (elementary concept)
“line”
“parallel-line” (hierarchical concept)
“parallel” (relation)
Question: How to compose the probability function of a hierarchical concept?
8
mask
“line”
observation
concept name
How do we construct the concept probability model for a hierarchical concept (e.g. “parallel-line”), using the constituent probability models?
Here the f are non-negative functions
(use “View” Slidesshow for animation)
Energy-based models
9
The probability function can be written in terms of a energy-based model , which maps the input to a scalar value which we called “energy”.
The benefit of using EBM is that multiplication of probability translates to addition of the energy terms:
Du, Yilun, et al. Compositional Visual Generation with Energy Based Models, NeurIPS 2020: 6637-6647.
Sampling: start with a random , do gradient descent with noise on to a low energy input.
My contribution: energy-based model for concept
10
“line”
mask
observation
concept name
: only if the are consistent, the energy will be low.
Example task: detecting a concept:
Given image , concept name , infer the mask
Solution: diffusion to an that minimizes energy.
random initial mask
diffuse to the correct mask corresponding to the concept
My contribution: ZeroC (Zero-shot concept recognition & acquisition)
11
Hierarchal concept model as composition of constituent concepts and relations
Key innovation: Hierarchical Composition Rule (e.g. “parallel-line”)
Concept graph for “parallel-line”:
“line”
“line”
“parallel”
“parallel-line”
one-to-one
correspondence
ZeroC: Zero-shot Concept Recognition and Acquisition
12
Training:
Given: data tuples of or
Learn: energy-based model or
x: input
m: mask
c: concept name
r: relation name
We augment the state-of-the-art EBM training objective [1] with three more regularizations (from first principles) to learn:
[1] Du, Yilun, et al. "Improved contrastive divergence training of energy based models." ICML 2021
make sure positive example have similar energy
ensure consistency in concept acquisition
encourages “connected” masks
ZeroC: Zero-shot Concept Recognition and Acquisition
13
Inference: (1) Zero-shot concept recognition
E.g. for the concept of “Fshape”:
Given: graph structure of a hierarchical concept
Compose: ZeroC first compose an EBM based on the given graph:
Detection: (infer the mask given image x and concept name c):
ZeroC: Zero-shot Concept Recognition and Acquisition
14
Inference: (1) Zero-shot concept recognition
E.g. for the concept of “Eshape”:
Given: graph structure of a hierarchical concept
Classification:
correct!
ZeroC: Zero-shot Concept Recognition and Acquisition
15
Inference: (2) Zero-shot concept acquisition
Difficult because it is a NP-hard subgraph isomorphism task
Experiment 1: zero-shot recognition
16
Training dataset (HDConcept: elementary concepts and relations):
Training on concepts of “Eshape”, “rectangle” and relations of “inside”, “non-overlap”, “outside”:
Experiment 1: zero-shot recognition
17
Test dataset (HDConcept: hierarchical concepts):
Test on hierarchical concept (e.g. Concept1) that consists of “Eshape”, “rectangle” combined in certain way. E.g.:
Experiment 1: zero-shot recognition
18
Experiment 2: zero-shot acquisition
19
Example task:
2D to 3D transfer of concepts without training:
*we use a stringent subgraph isomorphism accuracy which is only 1 if the inferred graph is isomorphic to ground-truth.
An individual node/edge accuracy of 0.8 will result in overall accuracy of 0.810 = 0.107
Experiment 3: CLEVR dataset:
20
ZeroC outperforms the strong baseline of CADA-VAE model, and able to reasonably classify the hierarchical concepts.
Summary:
In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way.
21
It is able to perform:
For more, see our paper and code at http://snap.stanford.edu/zeroc/, or SCAN the QR code: