1 of 59

Sparse Decomposition of

Summed Semantic Vectors

Douglas Summers-Stay1, Peter Sutor2, Dandan Li1

1. U.S. Army Research Laboratory

2. University of Maryland

2 of 59

3 of 59

Outline

  • Concepts
  • Sets as summed semantic vectors
  • Question answering by interpolation and analogy
  • Propositional deductive inference
  • Towards predicate logic

4 of 59

Concepts

Properties, Prototypes, Exemplars, or Theories?

Properties: a formal definition “man is a rational animal”

Prototypes: a sort of weighted average of its examples

Exemplars: a set of examples

Theories: a defining structure of interconnected relations

5 of 59

Concepts

  • One similar to any two others can be found
  • Exemplars can be broken down into exemplars
  • Despite the fact that "goldfish" isn’t a highly prototypical fish or a highly prototypical pet, it is a highly prototypical pet fish
  • Participate in chains of reasoning and trains of thought

6 of 59

Concepts can...

  • include properties and relations
  • be defined by prototypes, examples, and properties, and yet each time the same subject is asked what they are, slightly different answers are generated
  • be used for inference and conceptual combination
  • update many related concepts immediately on learning new information
  • participate in analogies
  • change based on context
  • be defined by relation to other concepts, which themselves seem to be defined by relation to the first concept

7 of 59

Artificial Natural Concepts

They don’t need to be biologically accurate, but they need to be able to perform the tasks that human concepts can perform.

  • Traditional symbolic representations alone can’t do this-- not continuous enough
  • Current neural representations alone can’t do this-- not composable enough

8 of 59

Part 2: Summed Sets of Terms

9 of 59

Distributional Semantic Vector:

  • 300 dimensional real-valued vector for a word in a natural language like English
  • words with similar meanings have similar vectors
  • derived from small window of neighboring terms in large text corpus
  • creates a dictionary of up to 3 million words

10 of 59

Averaged semantic vectors

Pentti Kanerva:

“The sum (and the mean) of random vectors has the following important property: it is similar to each of the vectors being added together. The similarity is very pronounced when only a few vectors are added and it plays a major role in artificial neural-net models. The sum-vector is a possible representation for the set that makes up the sum.”

11 of 59

Nearest neighbors of an average of two terms

words near protect: protect, safeguard, protecting, protects, shield, protected, protection, preserve, prevent

words near animal: animal, animals, Animal, animal welfare, dog, pet, cats, Animals, dogs, feline

words near (protect + animal)/2: protect, animal, safeguard, animals, protecting, humanely euthanize, preserve, animal welfare

top 2 results are the inputs themselves:

this is almost always the case.

12 of 59

Averaging more vectors, however...

(David + sorrow + California + space + musician)/5:

David, Debra Cassens, musician, Michael, Denice Thibodeau, Zachary Pincus, sorrow, Funeral Home Millvale…

space shows up as the 48th closest term

California shows up as the 178th closest term

The top 5 nearest terms are NOT all the terms averaged.

13 of 59

Research Question: How many vectors that were averaged together can we recover from the average and a dictionary?

14 of 59

A weighted average belongs to its own simplex

Choose a point on one of these triangles in 3D-space. With nearly 100% probability it does not also lie on another triangle.

15 of 59

Barycentric coordinates

16 of 59

Sparse decomposition

LASSO is a tool for decomposing a vector based on a dictionary in a way where most of the dictionary has zero weight (sparsity)

LASSO has a tunable parameter called λ that controls sparsity. We test over the full range of λ values (known as Dual Polytope Projection) to gather candidates, and then use barycentric coordinates to solve for weights

17 of 59

terms with non-zero weight at some value of λ

λ * 100

λ = 0 means ordinary least squares

λ = 1 means find the single best word

18 of 59

Sparse decomposition is

biologically plausible

Beyeler, M., Rounds, E., Carlson, K., Dutt, N., and Krichmar, J. L. (2017). Sparse coding and dimensionality reduction in cortex. bioRxiv, 149880.

Robert Coultrip, Richard Granger, and Gary Lynch. A cortical model of winner take-all competition via lateral inhibition. Neural networks, 5(1):47–54, 1992.

19 of 59

Full dictionary

(500,000 terms)

Take the weighted average of about 45 600-dimensional vectors:

All 45 vectors and weights can be recovered!

20 of 59

Size 10000 dictionary

Now 90 vectors and weights can be recovered.

21 of 59

Full dictionary,

terms from nearby

Averaged a set of vectors that had some meaning in common.

The number of 600-dimensional vectors and weights that can be recovered goes up to nearly 200.

22 of 59

full dictionary, noise

Noise reduces how many vectors and weights can be recovered, but more than 25 can still be recovered with some noise.

Recovery depends on size of vectors, number averaged, and amount of noise.

There are theoretical guarantees at about 50% of typical behavior.

23 of 59

What is this useful for?

24 of 59

1. Semantic betweenness

For example, nashi = .5 apple + .5 pear

Properties of nashi are properties of apple, of pear, or between the two-- or else we need to be told explicitly that is not the case (for example, that Nashi are popular in Japan but not the U.S.)

Sparse decomposition recovers the concepts being averaged and their weights

weights must sum to 1

25 of 59

2. A multi-set

Addition and subtraction add and subtract elements from the multiset as expected

Sparse decomposition recovers the members of the set and the weights

Weights must be non-negative integers

26 of 59

3. A fuzzy set

Fractional weights can represent degrees of inclusion in the set. Is ketchup a vegetable? If 1 is yes and 0 is no, ketchup could be somewhere in between.

This illustration was clearly made in the 1980s

27 of 59

4. A set

A set is a multiset where each weight must be one or zero

When we add two sets or multi-sets, their weights add, so elements that appear in both sets are more highly weighted. Since the weights can be precisely recovered, we can determine what part of the Venn diagram each element belongs to.

28 of 59

5. A probability distribution

If we constrain all weights to sum to 1, then the weights can represent a probability distribution over words.

29 of 59

6. An information structure

(This is a little contrived, but…) the sentence

The fox jumped over the dog

could be represented as

.2 fox + .3 jumped over + .5 dog

and we arbitrarily assign .2 for subject, .3 for predicate, and .5 for object

30 of 59

Set logic

The ability to figure out what region a vector belongs to allows us to perform logical operations on sets

(more on this later in the talk)

31 of 59

Learning from definitions

We can assign a vector to a new term by placing it in the intersection of sets it belongs to (using a literal Venn diagram in the semantic space)

32 of 59

Analogies?

Represent the meaning of the concept Detroit in the sentence ``Detroit won the game on Monday night."

( Detroit + Detroit_Lions + Detroit_Pistons + Detroit_Tigers + Michigan ) / 5 + Cleveland - Detroit

=

.15 Cleveland + .11 Cleveland_Cavaliers + .1 Ohio + .08 Cavaliers + .06 Browns + .06 Cleveland_Indians + ...

33 of 59

Concepts revisited

Are concepts prototypes or exemplars?

The same activation state representation could be both at once:

apply nearest neighbor to a state and it acts like a prototype

apply decomposition to a state and it acts like a set of exemplars

the nearest neighbors of (pet + fish)/2 creates a new exemplar whose neighbors are near the intersection of the set of pets and the set of fish

34 of 59

Concepts can...

  • include properties and relations
  • be defined by prototypes, examples, and properties, and yet each time the same subject is asked what they are, slightly different answers are generated
  • be used for inference and conceptual combination
  • update many related concepts immediately on learning new information
  • participate in analogies
  • change based on context
  • be defined by relation to other concepts, which themselves seem to be defined by relation to the first concept

35 of 59

Part 3: Answering questions by interpolation and analogy

astronomer

physicist

biologist

roboticist

chemist

paleontologist

psychologist

physics

chemicals

robots

dinosaurs

life

stars

36 of 59

Parallel semantic vectors

Beijing - China + Russia Moscow

Moscow

37 of 59

Examples of semantic vectors on “impossibly hard” analogy problems

Hofstadter (1995): “What possible set of apriori criteria would allow a computer to reply, perfectly self-confidently, that the country of Monaco is “the Atlantic City of France”?”

Vegas : Nevada :: ??? : France

1. Paris

2. Nice (the French city right next to Monaco)

3. EuroDisney

4. Monte Carlo

...

15. Monaco

38 of 59

"Relational priming is to analogy-making as one-ball juggling is to seven-ball juggling," by Robert M. French (2008):

"Ty Cobb ... after being egged on by reporters, he said, “You want me to hit home runs like Babe Ruth? No problem.” So, in the next game he hit three over-the-wall home runs and two more in the following game, after which he said, “I told you I could hit home runs. Now I’m going back to playing baseball the way it was meant to be played.” It occurred to me that this is analogous to a professor who publishes only books and who is criticized for never publishing in journals... he rapidly racks up a number of publications in the top journals in his field. Thereafter, he returns to writing books... This is the heart of analogy-making, and it is not in the least clear how relational priming, as implemented by the authors, could begin to deal with this problem."

"Babe Ruth : homerun :: ??? : scientific study"

1."researchers"

2. "scientists"

39 of 59

Rhymes using sums of semantic vectors

spaceship: moon balloon�pillow: head bed�trampoline: elastic gymnastic�rocking chair: knitter sitter�weed: vegetation eradication�novel: fiction depiction�juice: dilute fruit�cookie: naughty biscotti�Oreo: black snack�

brain: logical neurological�Lincoln: desegregation oration�Beatles: guitars superstars�honey: bees cheese�flower: bloom perfume, frilly lily�Clinton: she nominee�sun: skylight twilight�Bollywood: Hindi indie, Delhi telly

40 of 59

bear : hiker :: shark : ???

(forest + predator) : (forest + tourist) :: (ocean + predator) : (ocean + tourist)

B + C - A = D

forest + tourist + ocean + predator - forest - predator = snorkeler

41 of 59

astronomer

physicist

biologist

roboticist

chemist

paleontologist

psychologist

physics

chemicals

robots

dinosaurs

life

stars

Answering questions in semantic vector space

“scientist” simplex

X studies Y

“Gärdenfors (2000) proposed that concepts should be modeled as convex regions of a conceptual space. While convexity may seem a strong assumption, it is a remarkably regular property of many conceptual representations grounded in perception (e.g., color, taste, vowels) (Jäger 2007).”

astrophysics

42 of 59

Results on QA with simplex

answering queries of the form (entity, relation, ?X)

where ?X doesn’t appear in the KB

43 of 59

Part 2: Propositional Deductive Inference with Vectors

The question answering system gives us a region of semantic space where we can find the answer. But we would like to find a proof that this is true.

44 of 59

Propositional Deductive Inference

with Vectors

Goal: find a way to represent facts in a deductive tree such that the sum of supporting facts equals the result of the deduction. Decompose a semantic vector not into examples, but into supporting facts.

So if fact1 implies fact2 which implies fact3 which implies fact4, then

fact1 + fact2 + fact3 should equal fact4

45 of 59

Propositional Deductive Inference

with Vectors

If A is true, and A implies B, then B is true.

A + (-A + B) = B

If A OR B is true, and NOT A is true, then B is true.

(A + B) + -A = B

But you run into trouble in these cases:

A OR A

A + A

A OR NOT A

A + -A

A AND NOT A

A, -A

So, simplify first, then encode, and only use encoding for figuring out inference

46 of 59

Propositional Deductive Inference

with Vectors

47 of 59

Propositional Deductive Inference

with Vectors

To encode a sentence as vectors:

  1. Convert sentence to Conjunctive Normal Form
  2. Replace OR with +

Replace NOT with -

Replace AND with ,

For example:

NOT((NOT P IMPLIES NOT Q) AND NOT R) = (NOT((NOT NOT P OR NOT Q) AND NOT R)

in CNF:

(NOT P OR R) AND (Q OR R)

with replacements:

{-P + R,

Q + R}

48 of 59

Part 4: Steps towards predicate logic

49 of 59

A directed knowledge graph �of common sense facts

Finding a chain of reasoning means finding a connected path (or tree) in the graph.

John

father

child

road

danger

stove

exhibits

enters

is a

cares for

owns

avoids

exhibits

50 of 59

Embedded Knowledge Graph

road : (11,2)

stove (28,27)

father (18,15)

danger(30,12)

John (13,7)

child (5,23)

vector from John to father is:

(18,15) – (13,7) = (5,8)

father to child

(5,23) – (18,15) = (-13,8)

child to road

(11,2) – (5,23) = (6,-21)

road to danger

(30,12) – (11,2) = (19,10)

John to danger

(30,12) – (13,7)

John

father

child

road

danger

stove

exhibits

enters

is a

cares for

owns

avoids

exhibits

51 of 59

Embedded Knowledge Graph

road : (11,2)

stove (28,27)

father (18,15)

danger(30,12)

John (13,7)

child (5,23)

vector from John to father is:

(18,15) – (13,7) = (5,8)

father to child

(5,23) – (18,15) = (-13,8)

child to road

(11,2) – (5,23) = (6,-21)

road to danger

(30,12) – (11,2) = (19,10)

John to danger

(30,12) – (13,7)

(5,8) + (-13,8) + (6,-21) + (19,10) = (30,12) – (13,7)

Find vectors that sum to the vector from John to danger to form a chain of reasoning connecting the two ideas.

John

father

child

road

danger

stove

exhibits

enters

is a

cares for

owns

avoids

exhibits

52 of 59

Reconstructing Path from the Sum

  • Create a fully connected graph between all terms named anywhere in the sum, including the starting vector and the ending vector.
  • Assign very low weights to known relations
  • Assign low weights to nearby pairs in semantic space
  • Assign high weights to distant pairs in semantic space

  • Find the lowest cost path from the starting node to the ending node.

53 of 59

Deductive, Associative and Analogical Reasoning

  • Valid deductive reasoning when exact solution exists
  • Associational and analogical reasoning throughout the chain for best approximate solution in other cases
    • In the semantic vector space, pairs of words that are connected by the same vector are analogous. (king – man + woman : queen)
    • This means analogous relations are found as easily as direct relations by the sparse sum
    • No analogous relation is ever perfect in the vector space, so direct relations are still preferred.

54 of 59

Combining simplex QA with chains of reasoning

Step 1: try to answer the question with simplex QA

Step 2: gather potential answers A1..n

Step 3: choose a set of relations which could result in the appropriate relation, based on Horn rules

Step 4: find chains of reasoning leading from Q to each A1..n

Step 5: evaluate each chain for validity based on Horn rules

55 of 59

Example

Grass LocatedNear X?

most similar “locatedNear” facts:

( LocatedNear | husky | washington )

( LocatedNear | nose | eye )

( LocatedNear | bolt | receiver )

top answers: lawn, garden, grassy, vegetation, meadow, pasture, sod, greenery, shrubs, turf, tree, shrub, hay, lawns, park

grass|AtLocation ->lawn 41.359

grass|RelatedTo ->garden 53.894

...

grass|RelatedTo ->grain|RelatedTo ->granary 113.046

...

grass|RelatedTo ->leaf|AtLocation ->floral_arrangement 180.066

56 of 59

Sources of Knowledge

  • The system can take triples from any source: the ontologies do not have to correspond.
  • Currently includes:
    • Conceptnet
  • Future could include:
    • Wordnet
    • Cyc
    • NELL (Never Ending Language Learner)
    • Reading the web (AMRs)
    • Other sources

57 of 59

Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.

:op1 (t2 / thing

:ARG1-of (l / live-01

:ARG0 (p5 / person

:name (n4 / name

:op1 "Dorothy"))

:location (m / midst

:mod (p4 / prairie

:mod (g / great)

:name (n3 / name

:op1 "Kansas"))))

:name (n2 / name

:op1 (u / Uncle)

:op2 (h2 / Henry)))

:op2 (p3 / person

:ARG0-of (f2 / farm-01))

:op3 (t / thing

:name (n / name

:op1 (a / Aunt)

:op2 (e / Em)))

:op4 (p / person

:ARG0-of (h / have-rel-role-91

:ARG1 (p2 / person

:ARG0-of (f / farm-01))

:ARG2 (w / wife))))

58 of 59

Concepts revisited

  • include properties and relations
  • be defined by prototypes, examples, and properties, and yet each time the same subject is asked what they are, slightly different answers are generated
  • be used for inference and conceptual combination
  • update many related concepts immediately on learning new information
  • participate in analogies
  • change based on context
  • be defined by relation to other concepts, which themselves seem to be defined by relation to the first concept

59 of 59