1 of 61

Project Aristotle

Almost AGI

Q&A chatbot

Fix universe

No agency

2 of 61

The philosopher... as a goal!

3 of 61

Project Aristotle

  • Question answering (oracle chatbot)
    • Interactive so that complicated question can be put in context.
  • Fix universe
    • Large database of documents, books and articles.
  • No agency
    • Not connected to any apparatus capable of acting in the world.
  • Knowledge engine
    • Knowledge engine capable of reasoning on the static data.
    • Efficient treatment of store knowledge.

4 of 61

Factoid question answering

User: What is the date of birth of Mozart?

Aristotle: The famous music composer?

User: Yes.

Aristotle: Wolfgang Amadeus Mozart is born in January 27, 1756

5 of 61

Complex questions

User: How many different president have been in office in the 20th century?

Aristotle: You are referring to US presidents?

User: Yes

Aristotle: Are you referring to full term of do I include partial term?

User: Include partial term.

Aristotle: There we 17 distinct president that we in office for at least some time in the 20th century.

6 of 61

General intelligence

User: Which 21st US president implemented the most efficient policy in term of economic growth.

Aristotle: Do you equate economic growth to GDP?

User:

7 of 61

Dialectic of representation

8 of 61

The Assayer

Philosophy is written in this grand book, which stands continually open before our eyes (I say the Universe”), but can not be understood without first learning to comprehend the language and know the characters as it is written. It is written in mathematical language, and its characters are triangles, circles and other geometric figures, without which it is impossible to humanely understand a word; without these one is wandering in a dark labyrinth.

Galileo Galilei, The Assayer (1623)

9 of 61

10 of 61

11 of 61

Concept

Instances

12 of 61

Logical positivism

  • Galileo Galilei
  • Bertrand Russell
  • Noam Chomsky
  • Ludwig Wittgenstein

  • Knowledge base
  • Formal inferences
  • Absolute truth
  • Simple, clear and unambiguous

Law: xManxMortal

Fact: SocratesMan

Conclusion: SocratesMortal

13 of 61

Logical positivism

  • Too rigid, borderline cases, exceptions…
  • Father is a clear concept?
    • Biological father
    • Father-in-law
    • Legal father
    • And what if the poor guy changed sex?
  • How many countries in the worlds?
    • 195 or 196?
  • What happened in the 10 of September 1752?
  • What is the meaning of “nihilism” ?

14 of 61

Ludwig Wittgenstein

Tractatus

Investigation

15 of 61

Meaning by usage

  • Ludwig Wittgenstein
  • Connectionist language models
  • Words embedding

16 of 61

Can Quantum Mechanical Description of Physical Reality Be Considered Complete?

Einstein, Podolsky and Rosen

1935

A mix of natural language and mathematics

17 of 61

18 of 61

Dialectic of representation

Logic

Language

World

Discrete

Discrete (almost)

Continuous

Unambiguous

Ambiguous

Raw

Multiple

One

One

Constant

Drifting

Changing

Formal theories

Text

Video

Dense

Dense

Complete

19 of 61

Dialectic of representation

Discrete ↔ Continuous

Idealism ↔ Empiricism

Logic ↔ Senses

Tractatus ↔ Investigations

Word token ↔ Word embedding

Thinking slow ↔ Thinking fast

System 1 ↔ System 2

Modus ponen ↔ Bayes rule

Transparent ↔ Opaque

Old school AI ↔ Deep learning

20 of 61

Daniel Kahneman

Daniel Kahneman (1934 ) is an Israeli-American psychologist and economist notable for his work on the psychology of judgment and decision-making, as well as behavioral economics, for which he was awarded the 2002 Nobel Memorial Prize in Economic Sciences. His empirical findings challenge the assumption of human rationality prevailing in modern economic theory.

  • Thinking, Fast and Slow

21 of 61

Thinking, fast and slow

System 2

  • Slow
  • Conscious
  • Rationalism
  • Symbolism
  • Explanation

System 1

  • Fast
  • Unconscious
  • Empiricism
  • Connectionism
  • Rationalization

22 of 61

Thinking, fast and slow

System 2

  • There was a lot of traffic.
  • So:
  • I don’t like public transport.

System 1

  • There was a lot of traffic.
  • So:
  • I was late.

22

23 of 61

Morphemes and Sememes

24 of 61

Sememes hypothesis

I believe that that language has to be treated using discrete unit similar to words. Unfortunately tokenization in word is problematic for several reason and the use of morphemes is not necessarily more practical.

In this project we aim to learn a latent semantic vocabulary (sememes) in which every language can be faithfully and efficiently translated.

We hope that this latent sememe language will be useful in the field of linguistic and natural language processing.

25 of 61

Words and Morphemes

A word is the smallest element that can be uttered in isolation with objective or practical meaning.

A morpheme is the smallest unit of meaning but will not necessarily stand on its own.

26 of 61

A word can concatenate concept

  • free, freely
  • exact, exactly�
  • verbal, nonverbal
  • violent, nonviolent�
  • arm,chair, armchair
  • down, load, download

27 of 61

Prefix

Sufix

antifreeze

defrost

disagree

encode

embrace

forecast

injustice

impossible

interact

midway

misfire

nonsense

overlook

return

semicircle

submarine

superstar

transport

personal

hopped

wooden

higher

worker

biggest

careful

linguistic

running

attraction

infinity

plaintive

fearless

quickly

enjoyment

kindness

joyous

comfortable

28 of 61

Compoundwords

candlestick

campfire

candytuft

cannot

cardboard

carefree

careless

caretaker

carport

cartwheel

catfish

catnap

checkmate

checkroom

checkup

chestnut

chickpea

childbirth

childcare

childfreeh

childlike

childproof

childrearing

chopstick

clotheshorse

coastline

cobweb

copycat

coldframe

coldhearted

coldsore

commonsense

cookbook

cookout

cooktop

cookwear

cornbread

corncob

corndog

cornmeal

cornstalk

cottonmouth

countdown

counterattack

counterbalance

counterweight

countryside

courthouse

cowslip

crabgrass

craftsman

crawfish

crossbow

crossroad

crosswalk

crossword

crowbar

cubbyhole

cupboard

cupcake

29 of 61

Some groups of words act as a single word

  • I read books occasionally.
  • I read books from time to time.
  • I read books often.
  • I read books on a regular basis.
  • I read books compulsively.
  • I read books as much as I can.

French

English

fin de semaine

weekend

en réalité

as a matter of fact

avant

prior to

de temps en temps

occasionally

trotteuse

second hand

30 of 61

Words as a basic unit

English German Modern mandarin

taxi driver Taxifahrer 出租车司机

taxi driver taxi driver go out-rent-car-control-machine�

31 of 61

Words and sememes

  • Large dictionary of words: W
  • Small dictionary of sememes: S
  • Very large corpus: U W*
  • A one-to-one function that goes from words to set of sememes: E:W→2S
  • Typically
    • |U |> 1 000 000 000,
    • |W | > 500 000,
    • |S | < 1000
    • |F(w)|<10

32 of 61

Example of sememes tokenization

cats cat▪s�Canadian Canada▪ian�Italian Italy▪ian�French France▪ian�countdown count▪down �autotomy casting▪off▪limb

  • Six governors are waiting at the white house.
  • (Six) (governors) (are) (waiting) (at the) (white house)(.)
  • cap▪six elected▪state▪s is▪s wait▪now at_the government▪house▪US .

33 of 61

Embeddings

34 of 61

Embeddings hypothesis

One a procedure exists to transform document into tokens, for example using sememes, I believe that an independent machinery should produce embeddings based on large corpus of document without a specific task in mind.

  • A first approach should be used to produce basic embeddings based of co-occurrences of all frequent sememes and group of sememes.
  • A second approach should process sequence of basic embeddings to produce phrase embeddings.

35 of 61

Embeddings

  • Embedding space E=d
  • Likely sememes embeddings function E(Si)=Ei E
  • Encode:S*→ E
  • Decode:E→ S*
  • Classify
  • Translate

36 of 61

Recursive AutoEncoder (RAE)

The black dog bites the white cat

The black dog bites the white cat

37 of 61

Towards Lossless Encoding of Sentences

Prato, Chandar, Tapp�ACL 2019 Submission

Accuracy for exact and complete phrase reconstruction.

Embedding size: 300, 512 and 1024

38 of 61

Towards Lossless Encoding of Sentences

Sentiment Analysis

Stanford Sentiment Treebank

RAE is our approach

SST-2: complete sentence

SST-5: all sub phrases

39 of 61

Linguistic structure

This pusillanimus Canadian works at the White House.

This pusillanimus Canadian works at the White House.�

▪↑ this▪show lack courage▪Canada ian▪work s▪at▪the▪↑ white▪↑ house▪

▪↑ this▪show lack▪courage▪Canada ian▪work s at the▪↑ white ↑ house▪

40 of 61

Linguistic structure

This pusillanimus Canadian works at the White House.

▪↑ this▪show lack▪courage▪Canada ian▪work s at the▪↑ white ↑ house▪

41 of 61

Knowledge representation and reasoning

42 of 61

Formal reasoning does not capture common sense

The airplane is heavy, otherwise I could carry it.�The airplane is not heavy, otherwise it would not fly.

Because cheap horses are rare�and rare horses are expensive I claim that �cheap horses are expensive.

This is why old school AI did not succeed.

43 of 61

Independently consistent collection of theories

Quantum Mechanics

General

Relativity

Thermodynamics

Newton mechanics

Biology

44 of 61

Independently consistent collection of models

Common human

Literary human

Transhuman human

Ideal human

Biology

human

  • All men are equal
  • All men are mortal
  • Some men are not mortal
  • All men are biological
  • Some men are machines

45 of 61

Independently consistent collection of models

Christian

Consumer

Green

NRA

UFO

46 of 61

Knowledge representation and reasoning

  • Rasa NLU is an open-source tool for intent classification and entity extraction.
  • Dates, time, numbers
  • Hard facts
  • Logical reasoning
  • Probabilistic reasoning
  • Clear interpretation
  • Readability
  • Old school AI
  • Expert system

47 of 61

Logic

48 of 61

First Order Logic (FOL)

  • Sets
  • Variables
  • Functions
  • Propositional logic
  • Universal and existential quantifier
  • Axioms
  • Reasoning engine

Ex 1: All member of MILA has a key and a code.

Ex 2: ∀𝑥, ∃𝑦, (𝑃(𝑥)∨𝑃(𝑦))∧(𝑓(𝑥)=𝑓(𝑦))

49 of 61

50 of 61

Von Neumann–Bernays–Gödel set theory

NBG is Von Neumann–Bernays–Gödel set theory.

  • NBG is a conservative extension of ZFC.
  • NBG is finitely axiomatizable in FOL.

NBG can express all sciences!

51 of 61

52 of 61

Knowledge representation and reasoning

  • Dates, time, numbers
  • Hard facts
  • Logical reasoning
  • Probabilistic reasoning
  • Clear interpretation
  • Readability
  • Old school AI
  • Expert system

53 of 61

System 2 = FOL

  • First order logic is necessary and sufficient
  • Godel completeness
  • Universal (HOL in FOL)
  • Independently consistent collection of theory
  • Powerful reasoning engine
  • Easy to translate into English

54 of 61

Metamath

Metamath is a language for developing strictly formalized mathematical definitions and proofs accompanied by a proof checker for this language and a growing database of thousands of proved theorems covering conventional results in logic, set theory, number theory, group theory, algebra, analysis, and topology, as well as topics in Hilbert spaces and quantum logic.

55 of 61

Aristotle architecture

56 of 61

Project Aristotle

  • Sememes
  • Embeddings
  • World is a collection of independently consistent model
  • Reasoning engine based on FOL

57 of 61

58 of 61

Memory

Linguistic

Learning

Logic

59 of 61

60 of 61

Project Aristotle

  • MILA must compete
  • Google, Amazon and Facebook
  • Goal oriented dialogue
  • AGI complete
  • Based on the knowledge dialectic
  • Modular approach
  • Research and partner
  • Create resources
  • Clean data
  • Open source architecture

61 of 61

Aristotle

Knowledge

  • Wikipedia
  • Stanford Philosophy
  • Archive

Knowledge graph and ontology

Linguistic

  • Tokenise()
  • Embedding()
  • Encode() and Decode()
  • Classify()

  • Logic
    • AddRule()
    • Prove()
    • GetMissing()
  • Learning
    • Translate()
    • ParaphraseDegree()
    • EntailementDegree()
    • Formalyse()
    • Verbalyse()
  • Memory
    • DocumentHash()
    • QueryHash()
    • Query()
  • Engine
    • Goal oriented dialogue