1 of 78

Machine Translation: Introduction

Slides from Dan Jurafsky

Università di Pisa

Human Language Technologies

Dipartimento di Informatica

Università di Pisa

2 of 78

Outline

  • Introduction and a little history
  • Language Similarities and Divergences
  • Three classic MT Approaches
    • Transfer
    • Interlingua
    • Direct
  • Modern Statistical MT
  • Neural MT
  • Evaluation

3 of 78

What is MT?

Translating a text from one language (the source language) to another (the target language) automatically

4 of 78

Criticism of Altavista, Umberto Eco. 2007

Google 2007

Babelfish 2004

English Original

Italian Translation

The Works of Shakespeare

Gli impianti di Shakespeare

Hartcourt Brace

sostegno di Hartcourt

Speaker of the chamber of deputies

Altoparlante dell’alloggiamento dei delegati

Studies in the logic of Charles Sanders Pierce

Studi nella logica delle sabbiatrici Pierce del Charles

English Original

Italian Translation

The Works of Shakespeare

Le opere di Shakespeare

Hartcourt Brace

Hartcourt Brace

Speaker of the chamber of deputies

Presidente della Camera dei deputati

Studies in the logic of Charles Sanders Pierce

Studi nella logica di Charles Sanders Peirce

5 of 78

Google Translate

  • The translation

http://translate.google.com/translate?hl=en&sl=es&tl=en&u=http%3A%2F%2Fwww.cocinadominicana.com%2Facompanamientos-ensaladas-pastelones%2F1907-tostones.html

  • The original recipe for tostones

http://www.cocinadominicana.com/acompanamientos-ensaladas-pastelones/1907-tostones.html

Tostones are green plantain (or male) slices, fried, flattened, and then fried again.

Los tostones son rodajas de plátanos verdes (o machos), fritas, aplanadas y luego fritas nuevamente.

6 of 78

Google Translate

7 of 78

Machine Translation

The Story of the Stone (“The Dream of the Red Chamber”)

    • Cao Xueqin 1792

Chinese gloss: Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come

Hawkes translation: As she lay there alone, Dai-yu’s thoughts turned to Bao-chai. Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.

8 of 78

Machine Translation

  • Issues:
    • Sentence segmentation: 4 English sentences to 1 Chinese
    • Grammatical differences
      • Chinese rarely marks tense:
        • As, turned to, had begun
        • tou ⇨ penetrated
      • No pronouns or articles in Chinese
    • Stylistic and cultural differences
      • Bamboo tip plantain leaf ⇨ bamboos and plantains
      • Ma ‘curtain’ ⇨ curtains of her bed
      • Rain sound sigh drop ⇨ insistent rustle of the rain

9 of 78

Alignment in Machine Translation

10 of 78

Not just literature

Hansards: Canadian parliamentary proceedings

11 of 78

What is MT already good enough for?

  • Tasks for which a rough translation is fine
    • Extracting information (finding recipes!)
    • Web pages
    • email
  • Tasks for which MT can be post-edited
    • MT as first pass
    • Computer-aided human translation
  • Tasks in sublanguage domains where high-quality MT is possible
    • FAHQT (Fully Automatic High Quality Translation)

12 of 78

What is MT less good for?

  • Really hard stuff
    • Literature
    • Natural spoken speech (meetings, court reporting)

  • Really important stuff
    • Medical translation in hospitals
    • Emergency phone calls

13 of 78

MT Early History

1946 Booth and Weaver discuss MT at Rockefeller Foundation in New York

1947-48 idea of dictionary-based direct translation

1949 Weaver memorandum popularized idea

1952 all 18 MT researchers in world meet at MIT

1954 IBM/Georgetown Demo Russian-English MT

1955-65 lots of labs take up MT

14 of 78

IBM?Georhetown Thinking Machine

15 of 78

Warren Weaver memo

  • http://www.stanford.edu/class/linguist289/weaver001.pdf
  • “There are certain invariant properties which are… to some statistically useful degree, common to all languages.”
  • On March 4, 1947, “having considerable exposure to computer design problems during the war, and being aware of the speed, capacity, and logical flexibility possible in modern electronic computers”, Weaver suggested that computers to be used for translation

16 of 78

Early Research

  • MT research began in the early 1950s on machines less powerful than today’s calculators
  • Concurrent with foundational work on automata, formal languages, probabilities, and information theory
  • MT heavily funded by military, but basically just simple rule-based systems doing word substitution
  • Human language is more complicated than that, and varies more across languages!
  • Little understanding of natural language syntax, semantics, pragmatics
  • Problem soon appeared intractable

17 of 78

History of MT: Pessimism

1959/1960: Bar-Hillel “Report on the state of MT in US and GB”

    • Argued FAHQT too hard (semantic ambiguity, etc.)
    • Should work on semi-automatic instead of automatic
    • His argument:�Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy.
    • Only human knowledge lets us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller
    • His claim: we would have to encode all of human knowledge

18 of 78

History of MT: Pessimism

The ALPAC report

    • Headed by John R. Pierce of Bell Labs
    • Conclusions:
      • Supply of human translators exceeds demand
      • All the Soviet literature is already being translated
      • MT has been a failure: all current MT work had to be post-edited
      • Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations
    • Results:
      • MT research suffered
        • Funding loss
        • Number of research labs declined
        • Association for Machine Translation and Computational Linguistics dropped MT from its name

19 of 78

History of MT: Revival

1976 Meteo, weather forecasts from English to French

Systran (Babelfish) in use for 50 years

1970’s

European focus in MT; mainly ignored in US

1980’s

ideas of using early AI techniques in MT (KBMT, CMU)

Focus on “interlingua” systems, especially in Japan

1990’s

Commercial MT systems

Statistical MT

Speech-to-speech translation

2000’s

Statistical MT takes off

Google Translate

2015

Neural MT takes off

20 of 78

Language Similarities and Divergences

  • Some aspects of human language are universal or near-universal, others diverge greatly
  • Typology: the study of systematic cross-linguistic similarities and differences
  • What are the dimensions along with human languages vary?

21 of 78

Morphology

Morpheme

    • Minimal meaningful unit of language

Word = Morpheme+Morpheme+Morpheme+…

Stems: root plus derivational morphemes

    • hope+ing 🡪 hoping hop 🡪 hopping

Lemma: also called base form, root, lexeme

    • hoping 🡪 hope hopping 🡪 hop

Affixes

    • Prefixes: Antidisestablishmentarianism
    • Suffixes: Antidisestablishmentarianism
    • Infixes: hingi (borrow) – humingi (borrower) in Tagalog
    • Circumfixes: sagen (say) – gesagt (said) in German

22 of 78

Morphological Variation

Isolating languages

    • Cantonese, Vietnamese: each word generally has one morpheme

Agglutinative languages

    • Turkish: morphemes have clean boundaries

Polysynthetic languages

    • Siberian Yupik (‘Eskimo’): single word may have many morphemes

Fusion languages

    • Russian: single affix may have many morphemes

vs

vs

23 of 78

One word one phrase

  • Turkish

uygarlaştıramadıklarımızdanmışsınızcasına

uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına

Behaving as if you are among those whom we could not cause to become civilized

  • German

Donaudampfschiffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft

Danube steam shipping electricity main plant construction subordinate company

Donaudampfschifffahrtsgesellschaftskapitän

Donau+dampf+Schiff+Fahrt+s+gesellschafts+kapitän

Danube steam shipping company captain

24 of 78

Index of synthesis

Slide from Holger Diessel

isolating

synthetic

Vietnamese

English

Russian

Oneida

25 of 78

Isolating language

Vietnamese

Khi tôi đến nhà bạn, chúng tôi bắt đầu làm bài.

When I come house friend PL I begin do lesson

When I came to my friend’s house, we began to do lessons.

Cantonese

keui wa chyuhn gwok jeui daaih gaan nguk haih li gaan

he say entire country most big building house is this building

Slide from Holger Diessel

26 of 78

Synthetic language

Kirundi

Y-a-bi-gur-i-ye abâna

CL1-PST-CL8.them-buy-APPL-ASP CL2.children

He bought them for the children.

Slide from Holger Diessel

27 of 78

Polysynthetic language

Noun-incorporation (cf. fox-hunting, bird-watching)

Mohawk

a. r-ukwe’t-í:yo

he-person-nice

He is a nice person

b. wa-hi-sereth-óhare-‘se

PST-he/me-car-wash-for

He car-wash for me (= He washed my car)

c. kvtsyu v-kuwa-nya’t-ó:’ase

fish FUT-they/her-throat-slit

They will throat-slit a fish

Slide from Holger Diessel

28 of 78

Index of fusion

agglutinative

fusional

Swahili

Russian

Oneida

Slide from Holger Diessel

29 of 78

Agglutinative language

Words are formed by stringing together morphemes without changing them

Turkish

SG PL

Nominative adam adam-lar

Accusative adam-K adam-lar-K

Genitive adam-Kn adam-lar-Kn

Dative adam-a adam-lar-a

Locative adam-da adam-lar-da

Ablative adam-dan adam-lar-dan

Slide from Holger Diessel

30 of 78

Fusional language

A single inflectional morpheme to denote multiple grammatical feature, e.g. both tense and person

Russian

SG PL SG PL

Nominative stol stol-y lip-a lip-y

Accusative stol stol-y lip-u lip-y

Genitive stol-a stol-ov lip-y lip

Dative stol-u stol-am lip-e lip-am

Instrumental stol-om stol-ami lip-oj lip-ami

Prepositional stol-e stol-ax lip-e lip-ax

Slide from Holger Diessel

31 of 78

Word Order

  • SVO (Subject-Verb-Object) languages
    • English, German, French, Mandarin
  • SOV Languages
    • Japanese, Hindi

  • VSO languages
    • Irish, Classical Arabic
  • SVO languages generally use prepositions: to Yuriko
  • VSO languages generally use postpositions: Yuriko ni

32 of 78

Segmentation Variation

  • Not every writing system has word boundaries marked
    • Chinese, Japanese, Thai, Vietnamese
  • Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences:
    • Modern Standard Arabic, Chinese

33 of 78

Inferential Load: cold vs. hot langs

  • Some ‘cold’ languages require the hearer to do more “figuring out” of who the various actors in the various events are:
    • Japanese, Chinese
  • Other ‘hot’ languages are pretty explicit about saying who did what to whom:
    • English

34 of 78

Inferential Load (2)

All noun phrases in

blue do not appear

in the Chinese text …

But they are needed

for a good translation

35 of 78

Lexical Divergences

  • Word to phrases:
    • English “computer science” = French “informatique”
  • POS divergences
    • English: ‘she likes/VERB to sing’
    • German: Sie singt gerne/ADV
    • English: ‘I’m hungry/ADJ
    • Spamish: ‘tengo hambre/NOUN

36 of 78

Lexical Divergences: Specificity

Grammatical constraints

    • English has gender on pronouns, Mandarin not.
      • So translating “3rd person” from Chinese to English, need to figure out gender of the person!
      • Similarly from English “they” to French “ils/elles”

Semantic constraints

    • English: ‘brother’
    • Mandarin: ‘gege’ (older) versus ‘didi’ (younger)
    • English: ‘wall’
    • German: ‘Wand’ (inside) ‘Mauer’ (outside)
    • German: ‘Berg’
    • English: ‘hill’ or ‘mountain’

37 of 78

Lexical Divergence: many-to-many

38 of 78

Lexical Divergence: lexical gaps

  • Japanese: no word for privacy
  • English: no word for Cantonese ‘haauseun’ or Japanese ‘oyakoko’ (something like `filial piety’)

  • English ‘cow’ vs. ‘beef’, Cantonese ‘ngau’
  • English “fish”, Spanish “pez” vs. “pescado”
  • Italian ”maiale”, English “pork” vs. “mouton”

39 of 78

Event-to-argument divergences

  • English
    • The bottle floated out.
  • Spanish
    • La botella salió flotando.
    • The bottle exited floating

  • Verb-framed lang: mark direction of motion on verb
    • Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu families
  • Satellite-framed lang: mark direction of motion on satellite
    • Crawl out, float off, jump down, walk over to, run after
    • Rest of Indo-European, Hungarian, Finnish, Chinese

40 of 78

Structural divergences

  • German: Wir treffen uns am Mittwoch
  • English: We’ll meet on Wednesday

41 of 78

Head Swapping

  • English: X swim across Y
  • Spanish: X crucar Y nadando

  • English: I like to eat
  • German: Ich esse gern

  • English: I’d prefer vanilla
  • German: Mir wäre Vanille lieber

42 of 78

Thematic divergence

  • Spanish: Y me gusto
  • English: I like Y

  • German: Mir fällt der Termin ein
  • English: I forget the date

43 of 78

Divergence counts from Bonnie Dorr

  • 32% of sentences in UN Spanish/English Corpus (5K)

Categorial

X tener hambre

Y have hunger

98%

Conflational

X dar puñaladas a Z

X stab Z

83%

Structural

X entrar en Y

X enter Y

35%

Head Swapping

X cruzar Y nadando

X swim across Y

8%

Thematic

X gustar a Y

Y likes X

6%

44 of 78

3 “Classical” methods for MT

  • Direct
  • Transfer
  • Interlingua

45 of 78

Three MT Approaches: Direct, Transfer, Interlingual

46 of 78

Direct Translation

  • Proceed word-by-word through text
  • Translating each word
  • No intermediate structures except morphology
  • Knowledge is in the form of
    • Huge bilingual dictionary
    • word-to-word translation information
  • After word translation, can do simple reordering
    • Adjective ordering English -> French/Spanish

47 of 78

Direct MT Dictionary entry

48 of 78

Direct MT

49 of 78

Problems with direct MT

  • German

  • Chinese

50 of 78

The Transfer Model

  • Idea: apply contrastive knowledge, i.e., knowledge about the difference between two languages
  • Steps:
    • Analysis: Syntactically parse Source language
    • Transfer: Rules to turn this parse into parse for Target language
    • Generation: Generate Target sentence from parse tree

51 of 78

English to French

  • Generally
    • English: Adjective Noun
    • French: Noun Adjective
    • Note: not always true
      • ‘Route mauvaise’ -> ‘bad road, badly-paved road’
      • ‘Mauvaise route’ ‘wrong road’
      • but is a reasonable first approximation
    • Rule:

52 of 78

Transfer rules

Japanese

53 of 78

Lexical transfer

  • Transfer-based systems also need lexical transfer rules
  • Bilingual dictionary (like for direct MT)

English German

home nach Hause (going home)

Heim (home game)

Heimat (homeland, home country)

zu Hause (at home)

  • Can list “at home <-> zu Hause”
  • Or do Word Sense Disambiguation

54 of 78

Systran: combining direct and transfer

  • Analysis
    • Morphological analysis, POS tagging
    • Chunking of NPs, PPs, phrases
    • Shallow dependency parsing
  • Transfer
    • Translation of idioms
    • Word sense disambiguation
    • Assigning prepositions based on governing verbs
  • Synthesis
    • Apply rich bilingual dictionary
    • Deal with reordering
    • Morphological generation

55 of 78

Transfer: some problems

  • N2 sets of transfer rules!
  • Grammar and lexicon full of language-specific stuff
  • Hard to build, hard to maintain

56 of 78

Interlingua

  • Intuition: Instead of lang-lang knowledge rules, use the meaning of the sentence to help
  • Steps:
    1. translate source sentence into meaning representation
    2. generate target sentence from meaning.

57 of 78

Interlingua

Mary did not slap the green witch

58 of 78

Interlingua

  • Idea is that some of the MT work that we need to do is part of other NLP tasks
  • E.g., disambiguating En:book Es:‘libro’ from En:book Es:‘reservar’
  • So we could have concepts like BOOKVOLUME and RESERVE and solve this problem once for each language

59 of 78

Direct MT: pros and cons (Bonnie Dorr)

  • Fast
  • Simple
  • Cheap
  • No translation rules hidden in lexicon

  • Unreliable
  • Not powerful
  • Rule proliferation
  • Requires lots of context
  • Major restructuring after lexical substitution

Cons

Pros

60 of 78

Interlingual MT: pros and cons (B. Dorr)

  • Avoids the N2 problem
  • Easier to write rules
  • Semantics is HARD
  • Useful information lost (paraphrase)

Cons

Pros

61 of 78

Moving toward Statistical MT

62 of 78

Warren Weaver (1947)

When I look at an article in Russian, I say to myself: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.

Kevin Knight slide

63 of 78

Rosetta Stone

Carved in 196 BC

Found in 1799

Decoded in 1822

Egyptian hieroglyphs

Egyptian Demotic

Greek

Kevin Knight slide

64 of 78

Centauri/Arcturan [Knight, 1997]

Your assignment, translate this to Arcturan:

farok crrrok hihok yorok clok kantok ok-yurp

Kevin Knight slide

65 of 78

Centauri/Arcturan Parallel Corpus

Slide from Kevin Knight

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

66 of 78

Centauri to Arcturan Traslation

Slide from Kevin Knight

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

67 of 78

Centauri/Arcturan Alignment

(

(

(

(

(

(

Translating this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

(

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

(

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

(

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

(

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

(

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

(

Slide from Kevin Knight

68 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

(

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

(

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

(

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

(

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

(

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

(

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

(

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

(

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

(

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

(

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

(

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

(

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

???

Slide from Kevin Knight

69 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

Slide from Kevin Knight

70 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

Slide from Kevin Knight

71 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

Slide from Kevin Knight

72 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

???

Slide from Kevin Knight

73 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

Slide from Kevin Knight

74 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

process of

elimination

Slide from Kevin Knight

75 of 78

Centauri/Arcturan Alignment

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

cognate?

Slide from Kevin Knight

76 of 78

Centauri/Arcturan Alignment

Your assignment, put these words in order: { jjat, arrat, mat, bat, oloat, at-yurp }

1a. ok-voon ororok sprok .

1b. at-voon bichat dat .

7a. lalok farok ororok lalok sprok izok enemok .

7b. wat jjat bichat wat dat vat eneat .

2a. ok-drubel ok-voon anok plok sprok .

2b. at-drubel at-voon pippat rrat dat .

8a. lalok brok anok plok nok .

8b. iat lat pippat rrat nnat .

3a. erok sprok izok hihok ghirok .

3b. totat dat arrat vat hilat .

9a. wiwok nok izok kantok ok-yurp .

9b. totat nnat quat oloat at-yurp .

4a. ok-voon anok drok brok jok .

4b. at-voon krat pippat sat lat .

10a. lalok mok nok yorok ghirok clok .

10b. wat nnat gat mat bat hilat .

5a. wiwok farok izok stok .

5b. totat jjat quat cat .

11a. lalok nok crrrok hihok yorok zanzanok .

11b. wat nnat arrat mat zanzanat .

6a. lalok sprok izok jok stok .

6b. wat dat krat quat cat .

12a. lalok rarok nok izok hihok mok .

12b. wat nnat forat arrat vat gat .

zero

fertility

Slide from Kevin Knight

77 of 78

It’s Really Spanish/English

Clients do not sell pharmaceuticals in Europe => Clientes no venden medicinas en Europa

1a. Garcia and associates .

1b. Garcia y asociados .

7a. the clients and the associates are enemies .

7b. los clientes y los asociados son enemigos .

2a. Carlos Garcia has three associates .

2b. Carlos Garcia tiene tres asociados .

8a. the company has three groups .

8b. la empresa tiene tres grupos .

3a. his associates are not strong .

3b. sus asociados no son fuertes .

9a. its groups are in Europe .

9b. sus grupos estan en Europa .

4a. Garcia has a company also .

4b. Garcia tambien tiene una empresa .

10a. the modern groups sell strong pharmaceuticals .

10b. los grupos modernos venden medicinas fuertes .

5a. its clients are angry .

5b. sus clientes estan enfadados .

11a. the groups do not sell zenzanine .

11b. los grupos no venden zanzanina .

6a. the associates are also angry .

6b. los asociados tambien estan enfadados .

12a. the small groups are not modern .

12b. los grupos pequenos no son modernos .

 

Slide from Kevin Knight

78 of 78

Summary

  • Intro and a little history
  • Language Similarities and Divergences
  • Three classic MT Approaches
    • Transfer
    • Interlingua
    • Direct