Sparse Decomposition of
Summed Semantic Vectors
Douglas Summers-Stay1, Peter Sutor2, Dandan Li1
1. U.S. Army Research Laboratory
2. University of Maryland
Outline
Concepts
Properties, Prototypes, Exemplars, or Theories?
Properties: a formal definition “man is a rational animal”
Prototypes: a sort of weighted average of its examples
Exemplars: a set of examples
Theories: a defining structure of interconnected relations
Concepts
Concepts can...
Artificial Natural Concepts
They don’t need to be biologically accurate, but they need to be able to perform the tasks that human concepts can perform.
Part 2: Summed Sets of Terms
Distributional Semantic Vector:
Averaged semantic vectors
Pentti Kanerva:
“The sum (and the mean) of random vectors has the following important property: it is similar to each of the vectors being added together. The similarity is very pronounced when only a few vectors are added and it plays a major role in artificial neural-net models. The sum-vector is a possible representation for the set that makes up the sum.”
Nearest neighbors of an average of two terms
words near protect: protect, safeguard, protecting, protects, shield, protected, protection, preserve, prevent
words near animal: animal, animals, Animal, animal welfare, dog, pet, cats, Animals, dogs, feline
words near (protect + animal)/2: protect, animal, safeguard, animals, protecting, humanely euthanize, preserve, animal welfare
top 2 results are the inputs themselves:
this is almost always the case.
Averaging more vectors, however...
(David + sorrow + California + space + musician)/5:
David, Debra Cassens, musician, Michael, Denice Thibodeau, Zachary Pincus, sorrow, Funeral Home Millvale…
space shows up as the 48th closest term
California shows up as the 178th closest term
The top 5 nearest terms are NOT all the terms averaged.
Research Question: How many vectors that were averaged together can we recover from the average and a dictionary?
A weighted average belongs to its own simplex
Choose a point on one of these triangles in 3D-space. With nearly 100% probability it does not also lie on another triangle.
Barycentric coordinates
Sparse decomposition
LASSO is a tool for decomposing a vector based on a dictionary in a way where most of the dictionary has zero weight (sparsity)
LASSO has a tunable parameter called λ that controls sparsity. We test over the full range of λ values (known as Dual Polytope Projection) to gather candidates, and then use barycentric coordinates to solve for weights
terms with non-zero weight at some value of λ
λ * 100
λ = 0 means ordinary least squares
λ = 1 means find the single best word
Sparse decomposition is
biologically plausible
Beyeler, M., Rounds, E., Carlson, K., Dutt, N., and Krichmar, J. L. (2017). Sparse coding and dimensionality reduction in cortex. bioRxiv, 149880.
Robert Coultrip, Richard Granger, and Gary Lynch. A cortical model of winner take-all competition via lateral inhibition. Neural networks, 5(1):47–54, 1992.
Full dictionary
(500,000 terms)
Take the weighted average of about 45 600-dimensional vectors:
All 45 vectors and weights can be recovered!
Size 10000 dictionary
Now 90 vectors and weights can be recovered.
Full dictionary,
terms from nearby
Averaged a set of vectors that had some meaning in common.
The number of 600-dimensional vectors and weights that can be recovered goes up to nearly 200.
full dictionary, noise
Noise reduces how many vectors and weights can be recovered, but more than 25 can still be recovered with some noise.
Recovery depends on size of vectors, number averaged, and amount of noise.
There are theoretical guarantees at about 50% of typical behavior.
What is this useful for?
1. Semantic betweenness
For example, nashi = .5 apple + .5 pear
Properties of nashi are properties of apple, of pear, or between the two-- or else we need to be told explicitly that is not the case (for example, that Nashi are popular in Japan but not the U.S.)
Sparse decomposition recovers the concepts being averaged and their weights
weights must sum to 1
2. A multi-set
Addition and subtraction add and subtract elements from the multiset as expected
Sparse decomposition recovers the members of the set and the weights
Weights must be non-negative integers
3. A fuzzy set
Fractional weights can represent degrees of inclusion in the set. Is ketchup a vegetable? If 1 is yes and 0 is no, ketchup could be somewhere in between.
This illustration was clearly made in the 1980s
4. A set
A set is a multiset where each weight must be one or zero
When we add two sets or multi-sets, their weights add, so elements that appear in both sets are more highly weighted. Since the weights can be precisely recovered, we can determine what part of the Venn diagram each element belongs to.
5. A probability distribution
If we constrain all weights to sum to 1, then the weights can represent a probability distribution over words.
6. An information structure
(This is a little contrived, but…) the sentence
The fox jumped over the dog
could be represented as
.2 fox + .3 jumped over + .5 dog
and we arbitrarily assign .2 for subject, .3 for predicate, and .5 for object
Set logic
The ability to figure out what region a vector belongs to allows us to perform logical operations on sets
(more on this later in the talk)
Learning from definitions
We can assign a vector to a new term by placing it in the intersection of sets it belongs to (using a literal Venn diagram in the semantic space)
Analogies?
Represent the meaning of the concept Detroit in the sentence ``Detroit won the game on Monday night."
( Detroit + Detroit_Lions + Detroit_Pistons + Detroit_Tigers + Michigan ) / 5 + Cleveland - Detroit
=
.15 Cleveland + .11 Cleveland_Cavaliers + .1 Ohio + .08 Cavaliers + .06 Browns + .06 Cleveland_Indians + ...
Concepts revisited
Are concepts prototypes or exemplars?
The same activation state representation could be both at once:
apply nearest neighbor to a state and it acts like a prototype
apply decomposition to a state and it acts like a set of exemplars
the nearest neighbors of (pet + fish)/2 creates a new exemplar whose neighbors are near the intersection of the set of pets and the set of fish
Concepts can...
Part 3: Answering questions by interpolation and analogy
astronomer
physicist
biologist
roboticist
chemist
paleontologist
psychologist
physics
chemicals
robots
dinosaurs
life
stars
Parallel semantic vectors
Beijing - China + Russia ≈ Moscow
≈ Moscow
Examples of semantic vectors on “impossibly hard” analogy problems
Hofstadter (1995): “What possible set of apriori criteria would allow a computer to reply, perfectly self-confidently, that the country of Monaco is “the Atlantic City of France”?”
Vegas : Nevada :: ??? : France
1. Paris
2. Nice (the French city right next to Monaco)
3. EuroDisney
4. Monte Carlo
...
15. Monaco
"Relational priming is to analogy-making as one-ball juggling is to seven-ball juggling," by Robert M. French (2008):
"Ty Cobb ... after being egged on by reporters, he said, “You want me to hit home runs like Babe Ruth? No problem.” So, in the next game he hit three over-the-wall home runs and two more in the following game, after which he said, “I told you I could hit home runs. Now I’m going back to playing baseball the way it was meant to be played.” It occurred to me that this is analogous to a professor who publishes only books and who is criticized for never publishing in journals... he rapidly racks up a number of publications in the top journals in his field. Thereafter, he returns to writing books... This is the heart of analogy-making, and it is not in the least clear how relational priming, as implemented by the authors, could begin to deal with this problem."
"Babe Ruth : homerun :: ??? : scientific study"
1."researchers"
2. "scientists"
Rhymes using sums of semantic vectors
spaceship: moon balloon�pillow: head bed�trampoline: elastic gymnastic�rocking chair: knitter sitter�weed: vegetation eradication�novel: fiction depiction�juice: dilute fruit�cookie: naughty biscotti�Oreo: black snack�
�brain: logical neurological�Lincoln: desegregation oration�Beatles: guitars superstars�honey: bees cheese�flower: bloom perfume, frilly lily�Clinton: she nominee�sun: skylight twilight�Bollywood: Hindi indie, Delhi telly
bear : hiker :: shark : ???
(forest + predator) : (forest + tourist) :: (ocean + predator) : (ocean + tourist)
B + C - A = D
forest + tourist + ocean + predator - forest - predator = snorkeler
astronomer
physicist
biologist
roboticist
chemist
paleontologist
psychologist
physics
chemicals
robots
dinosaurs
life
stars
Answering questions in semantic vector space
“scientist” simplex
X studies Y
“Gärdenfors (2000) proposed that concepts should be modeled as convex regions of a conceptual space. While convexity may seem a strong assumption, it is a remarkably regular property of many conceptual representations grounded in perception (e.g., color, taste, vowels) (Jäger 2007).”
astrophysics
Results on QA with simplex
answering queries of the form (entity, relation, ?X)
where ?X doesn’t appear in the KB
Part 2: Propositional Deductive Inference with Vectors
The question answering system gives us a region of semantic space where we can find the answer. But we would like to find a proof that this is true.
Propositional Deductive Inference
with Vectors
Goal: find a way to represent facts in a deductive tree such that the sum of supporting facts equals the result of the deduction. Decompose a semantic vector not into examples, but into supporting facts.
So if fact1 implies fact2 which implies fact3 which implies fact4, then
fact1 + fact2 + fact3 should equal fact4
Propositional Deductive Inference
with Vectors
If A is true, and A implies B, then B is true.
A + (-A + B) = B
If A OR B is true, and NOT A is true, then B is true.
(A + B) + -A = B
But you run into trouble in these cases:
A OR A
A + A
A OR NOT A
A + -A
A AND NOT A
A, -A
So, simplify first, then encode, and only use encoding for figuring out inference
Propositional Deductive Inference
with Vectors
Propositional Deductive Inference
with Vectors
To encode a sentence as vectors:
Replace NOT with -
Replace AND with ,
For example:
NOT((NOT P IMPLIES NOT Q) AND NOT R) = (NOT((NOT NOT P OR NOT Q) AND NOT R)
in CNF:
(NOT P OR R) AND (Q OR R)
with replacements:
{-P + R,
Q + R}
Part 4: Steps towards predicate logic
A directed knowledge graph �of common sense facts
Finding a chain of reasoning means finding a connected path (or tree) in the graph.
John
father
child
road
danger
stove
exhibits
enters
is a
cares for
owns
avoids
exhibits
Embedded Knowledge Graph
road : (11,2)
stove (28,27)
father (18,15)
danger(30,12)
John (13,7)
child (5,23)
vector from John to father is:
(18,15) – (13,7) = (5,8)
father to child
(5,23) – (18,15) = (-13,8)
child to road
(11,2) – (5,23) = (6,-21)
road to danger
(30,12) – (11,2) = (19,10)
John to danger
(30,12) – (13,7)
John
father
child
road
danger
stove
exhibits
enters
is a
cares for
owns
avoids
exhibits
Embedded Knowledge Graph
road : (11,2)
stove (28,27)
father (18,15)
danger(30,12)
John (13,7)
child (5,23)
vector from John to father is:
(18,15) – (13,7) = (5,8)
father to child
(5,23) – (18,15) = (-13,8)
child to road
(11,2) – (5,23) = (6,-21)
road to danger
(30,12) – (11,2) = (19,10)
John to danger
(30,12) – (13,7)
(5,8) + (-13,8) + (6,-21) + (19,10) = (30,12) – (13,7)
Find vectors that sum to the vector from John to danger to form a chain of reasoning connecting the two ideas.
John
father
child
road
danger
stove
exhibits
enters
is a
cares for
owns
avoids
exhibits
Reconstructing Path from the Sum
Deductive, Associative and Analogical Reasoning
Combining simplex QA with chains of reasoning
Step 1: try to answer the question with simplex QA
Step 2: gather potential answers A1..n
Step 3: choose a set of relations which could result in the appropriate relation, based on Horn rules
Step 4: find chains of reasoning leading from Q to each A1..n
Step 5: evaluate each chain for validity based on Horn rules
Example
Grass LocatedNear X?
most similar “locatedNear” facts:
( LocatedNear | husky | washington )
( LocatedNear | nose | eye )
( LocatedNear | bolt | receiver )
top answers: lawn, garden, grassy, vegetation, meadow, pasture, sod, greenery, shrubs, turf, tree, shrub, hay, lawns, park
grass|AtLocation ->lawn 41.359
grass|RelatedTo ->garden 53.894
...
grass|RelatedTo ->grain|RelatedTo ->granary 113.046
...
grass|RelatedTo ->leaf|AtLocation ->floral_arrangement 180.066
Sources of Knowledge
Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.
:op1 (t2 / thing
:ARG1-of (l / live-01
:ARG0 (p5 / person
:name (n4 / name
:op1 "Dorothy"))
:location (m / midst
:mod (p4 / prairie
:mod (g / great)
:name (n3 / name
:op1 "Kansas"))))
:name (n2 / name
:op1 (u / Uncle)
:op2 (h2 / Henry)))
:op2 (p3 / person
:ARG0-of (f2 / farm-01))
:op3 (t / thing
:name (n / name
:op1 (a / Aunt)
:op2 (e / Em)))
:op4 (p / person
:ARG0-of (h / have-rel-role-91
:ARG1 (p2 / person
:ARG0-of (f / farm-01))
:ARG2 (w / wife))))
Concepts revisited