HOW COMPUTERS REPRESENT AND GENERATE MEANING
UNIT 2, MODULE 2.5
1
MODULE OBJECTIVES
Students should be able to:
2
HOW COMPUTERS REPRESENT AND GENERATE MEANING?
3
“WHAT ARE THE LARGEST CITIES IN GEORGIA BY AREA?”
4
FIND x WHERE� class(x) = “city” AND� geographic_area(x) is large AND� location(x) = “Georgia”�SORT BY geographic_area
Module 2.3 Flashback - How do computers understand meaning?
Structure of the query
Intent (meaning) of the query
WE’RE GOING TO LOOK AT MACHINE TRANSLATION BECAUSE THAT WILL GIVE US A BETTER PICTURE OF WHAT COMPUTERS HAVE TO DO TO UNDERSTAND MEANING.
5
WHY IS MACHINE TRANSLATION HARD?
6
VIDEO: REAL-TIME MACHINE TRANSLATION
7
https://www.youtube.com/watch?v=K4GMBWn-CMI (2:00-2:18)
TRY IT #1: EXPERIMENT WITH GOOGLE TRANSLATE
8
TRY IT #2: WORD-AT-A-TIME TRANSLATION DOESN’T WORK
MANY WORDS HAVE MULTIPLE SENSES (MEANINGS).
TRY IT: USE GOOGLE TRANSLATE TO TRANSLATE THE ENGLISH WORDS IN THIS CHART AND SEE HOW MANY DIFFERENT SPANISH WORDS EACH ONE CAN TRANSLATE TO.
9
Try it: Do this translations and see how many other words “quarter” can be translated into. (Click on Show more)
English | Spanish | Other spanish translations |
Lead | | |
Port | | |
Quarter | | |
| | |
WHAT ARE IDIOMS?
IDIOMS ARE COMMONLY USED EXPRESSIONS WHOSE MEANING IS NOT THE LITERAL MEANING OF THE INDIVIDUAL WORDS.
10
IDIOMS CAN BE HARD TO TRANSLATE
SOME IDIOMS HAVE EQUIVALENTS IN OTHER LANGUAGES, BUT NOT ALL DO.
IF SOMETHING IS VERY EASY, WE COULD SAY:
GOOGLE KNOWS A LOT OF IDIOMS, BUT WHAT IF THERE IS NO EQUIVALENT IN THE OTHER LANGUAGE?
11
TRY IT #3: GOOGLE TRANSLATE IDIOMS
STEP 1: OPEN GOOGLE TRANSLATE IN A BROWSER HTTPS://TRANSLATE.GOOGLE.COM
STEP 2: CHOOSE 4 SPANISH IDIOMS FROM THIS LIST OF 46 SPANISH IDIOMS TO TRANSLATE FROM SPANISH TO ENGLISH AND SEE IF IT PRESERVES THE MEANING OR JUST TRANSLATES INDIVIDUAL WORDS.
12
Spanish Idiom | Literal Translation | Meaning | English Equivalent | Google’s translation | Does Google Translate it correctly? (Yes/No) |
No pegar un ojo | Not to strike an eye | Not being able to sleep | Without sleeping a wink | | |
Andar con pies de plomo | To walk with lead feet | To be very careful | to walk on eggshells | | |
| | | | | |
| | | | | |
TRY IT: GOOGLE TRANSLATE IDIOMS, PART 2
STEP 3: STEP 3: LOOK UP 2 IDIOMS IN ANOTHER LANGUAGE TO TRANSLATE TO ENGLISH
FOR EXAMPLE, “TURKISH IDIOMS FOR KIDS”. ADDING “FOR KIDS” ENSURES THAT IT IS SCHOOL APPROPRIATE.
13
Language | Idiom | Literal Translation | Meaning | English Equivalent | Google’s translation | Does Google Translate it correctly? (Yes/No) |
| | | | | | |
| | | | | | |
Step 4: Reflect- How well did google do? How comfortable would you feel using Google translate to help you understand the meaning of a book or article if you translated it?
ANOTHER THING THAT MAKES TRANSLATION HARD
LET’S LOOK AT TURKISH…
14
TURKISH LACKS GENDERED PRONOUNS
15
The translations are the same!
TURKISH LACKS GENDERED PRONOUNS
16
In earlier machine translation systems, “doctor” would be assumed male and “nurse” female.
TRY IT #4: TRANSLATE ENGLISH TO TURKISH
17
WHAT DID YOU NOTICE FOR YOUR EXPERIMENTS WITH GOOGLE TRANSLATE?
18
Discussion
HUMANS CAN HELP AI WITH HARD TASKS:�“WORKING HARD OR HARDLY WORKING”
IN ENGLISH THIS IS A COLLOQUIALISM.
GOOGLE KNOWS THE SPANISH TRANSLATION BECAUSE�HUMANS TAUGHT IT. IT ALSO KNOWS THE TURKISH TRANSLATION.
19
Many AI systems rely on crowd-sourcing (human- contributed inputs) to help them tackle complex tasks.
Colloquialism (kuh-LOH-kwee-uh-liz-um) is the use of informal, everyday language in writing. The word derives from the Latin colloquium, meaning “speaking together” or “conversation.” This includes slang, proverbs, and catch phrases.�Example: You only live once (YOLO)
HUMANS DIDN’T HELP OUT WITH JAVANESE
20
OTHER COMPANIES ALSO DO MACHINE TRANSLATION
TRY COMPARING GOOGLE’S TRANSLATION OF A PASSAGE WITH A TRANSLATION FROM DEEPL.COM
SEE WHICH ONE DOES A BETTER JOB
21
HOW DO YOU THINK GOOGLE TRANSLATES BETWEEN 112 DIFFERENT LANGUAGES?
22
Discussion
There are 112 × 111 = 12,432 combinations of “x to y” translation!
HOW TO TRANSLATE BETWEEN 112 DIFFERENT LANGUAGES
23
English
Spanish
Turkish
Javanese
Language-�Independent Meaning Representation
from English
from Spanish
from Turkish
from Javanese
English
Spanish
Turkish
Javanese
to English
to Spanish
to Turkish
to Javanese
112 “x to meaning” translators.
112 “meaning to y” translators.
Only 224 translators needed, not 12,432!
TRANSFORMER NEURAL NETWORKS
TRANSFORMERS are a kind of neural network designed for language tasks, such as machine translation, question answering, and text generation.
They are trained using millions of words of text, which allows them to use statistics to infer meanings of words based on surrounding words.
Word embeddings are the inputs to transformer networks.
Google uses transformer networks to understand search queries, and to translate text between languages.
24
Transformer Network
Word
Embeddings
TRY IT #5: TALK WITH TRANSFORMER
STEP #1 - OPEN TALK TO TRANSFORMER (NOW INFER KIT) HTTPS://APP.INFERKIT.COM/DEMO
THIS DEMO IT TAKES A STORY PROMPT AS INPUT AND TRIES TO WRITE THE STORY:�
STEP #2: TRY THIS PROMPT: “I TOOK MY DOG TO THE DOG PARK BUT THERE WERE ONLY CATS THERE.”
COPY AND PASTE THE STORY IT CREATES.
STEP #3: TRY A PROMPT OF YOUR OWN.
WRITE PROMPT HERE
STEP #4: COPY AND PASTE THE STORY IT CREATES.
STEP #5: REFLECT - WHAT DO YOU THINK OF THE STORY? DOES IT SEEM LIKE SOMETHING A PERSON WOULD WRITE?
25
TAKEAWAYS
26