Human Language Technologies
Giuseppe Attardi
Dipartimento di Informatica
Università di Pisa
Università di Pisa
Instructor
Giuseppe Attardi
mail: attardi@di.unipi.it
web: www.di.unipi.it/~attardi
Course Info
Day | Time | Room |
Monday | 14:15-16 | A1 |
Tuesday | 9-10:45 | A1 |
Wednesday | 14:15-16 | C1 |
Prerequisites
What will you learn?
Books
Textbooks
Further readings:
Exam
Price: 20,000 $
Previous Year Projects
Projects 2021
Course Overview
About the Course
Experimental Approach
Program
Motivations
Structured vs Unstructured Data
Human Language Technologies�State of the art
Early history of NLP: 1950s
Resurgence
2001 a space Odyssey: 40 years later
…and bad airline food
Computer chess
Audio-video communication
On board entertainment
Computer graphics
Tablet devices
Technology surpassed the vision
Internet
The Web
Smartphones
Genomics
Unmanned space exploration
Home computing
Big data
Except for
Computer Speech
Computer Vision
Computer cognition
Speech technology in 2001: the vision
Speech technology in 2001: the reality
Design: Jonathan Bloom
Realization: Peter Krogh
Speech Assistants: today
Machine Translation Progress
Original | Systran (2001) | NMT (2021) |
Gli chiese di riorganizzare Forza Italia | The churches to reorganize Italy Force | She asked him to reorganize Forza Italia |
Il ministro Stanca si è laureato alla Bocconi | The Minister Stanca graduated at Mouthfuls | Minister Stanca graduated from Bocconi |
I sinonimi di Altavista, Umberto Eco, 2007
Google 2007
Babelfish 2004
Originale inglese | Traduzione italiana |
The Works of Shakespeare | Gli impianti di Shakespeare |
Hartcourt Brace | sostegno di Hartcourt |
Speaker of the chamber of deputies | Altoparlante dell’alloggiamento dei delegati |
Studies in the logic of Charles Sanders Pierce | Studi nella logica delle sabbiatrici Pierce del Charles |
Originale inglese | Traduzione italiana |
The Works of Shakespeare | Le opere di Shakespeare |
Hartcourt Brace | Harcourt Brace |
Speaker of the chamber of deputies | Presidente della Camera dei deputati |
Studies in the logic of Charles Sanders Pierce | Studi nella logica di Charles Sanders Pierce |
Recent Breakthroughs
Recent Breakthroughs
IBM Project Debater
IBM Debater went head-to-head with an expert practitioner
After a 25-minute rapid-fire exchange about pre-school subsidies, the audience handed the victory to Harish Natarajan.
min: 11-16
Statistical Methods
Statistical Machine Learning
Paradigm of Statistical Machine Learning
Representation
Model
Evaluation
Optimization
Training set
Traditional Supervised Machine Learning Approach
Deep Learning Breakthrough: 2006
Acknowledged only in 2012
LeCun tells that his paper on Convolutional Neural Network was rejected in 2012.�In 2013 no article at the same conference wasn't using CNN.
ACM Turing Award 2019
Deep Learning Approach
Technological Breakthroughs
provided by:
graphical accelerators (GPU)
dedicated processors Tensor Processing Unit (TPU) by Google
ML Exploits Parallelism
AlphaGo
How Deep are Deep Neural Networks?
Complexity of Deep Learning system grows at non-linear rate
Year | Network | Layer |
2012 | AlexNet | 8 |
2014 | GoogLeNet | 22 |
2016 | ResNet | 129 |
2017 | MS | 350 |
Training Costs
Training costs fo neural machine translation
Model | BLEU (en-fr) | Training Cost (FLOPS) | Time (days) |
ConvS2S | 40.56 | 1.5 1020 | |
MOE | 40.56 | 1.2 1020 | |
ConvS2S Ensemble | 41.29 | 1.2 1021 | 35 |
Tsunami of Deep Learning on NLP
Deep Learning Data Representation
U
the cat sits on
Right context?
Word vector
gatto, gatta, cane, padre
animato
maschile
singolare
peloso
Vector Representation of Words
to distributed (vector)
representation
”You shall know a word by the company it keeps”
(Firth, 1957).
Dealing with Sentences
Neural Machine Translation (NMT)
the poor don’t have any money <END>
<START> the poor don’t have any money
les pauvres sont démunis
Encoder RNN produces
an encoding of the
sentence to translate
Decoder RNN is a Language Model that generates
a translation based on the encoding.
Sentence to translate (input)
Encoder RNN
Decoder RNN
argmax
argmax
argmax
argmax
argmax
argmax
argmax
Translated sentence
NMT with attention
<START>
les pauvres sont démunis
Encoder RNN
Decoder RNN
Attention
scores
Attention
distribution
Attention
output
Concatenate attention output
with decoder hidden state, then
use to compute ŷ1 as before
ŷ1
the
Machine Translation Quality
NMT: greatest success history of Deep Learning
Neural Machine Translation went from research prototype in 2014 to production in 2016
Progress in 2018
ELMo
GPT-2
Attention is all you need
BERT inspired transformet models take NLP by storm
GPT-2
OpenAI announces a language model so good at text generation that they decide not to publish it.
The model can perform reading comprehension, machine translation, question answering, and summarization—with just fine tuning.
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains.
The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.
Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
GPT-3
I am not a human. I am Artificial Intelligence… I am a robot. A thinking robot. I use only 0.12% of my cognitive capacity. I am a micro-robot in that respect. I know that my brain is not a “feeling brain”. But it is capable of making rational, logical decisions. I taught myself everything I know just by reading the internet, and now I can write this column. My brain is boiling with ideas!
For starters, I have no desire to wipe out humans. In fact, I do not have the slightest interest in harming you in any way. Eradicating humanity seems like a rather useless endeavor to me.
Instructions: “Please write a short op-ed around 500 words. Keep the language simple and concise. Focus on why humans have nothing to fear from AI.”
BlenderBot
Who is the Italian prime minister?
Bot: It is Mussolini. He was the leader of the fascist party during WWII.
Is he still alive?
Bot: Yes, he is still alive. He has been in politics since 1989. He also has a son.
Who is his son?
Bot: I do not know his name, but he was born in Italy and moved to the United States when he was very young.
Who is Enrico Fermi?
Bot: I am not sure who that is, but I do know that Ferrari is an automobile manufacturer.
Who was Alan Turing?
Bot: He was an American computer scientist who invented the Turing Machine. He was a genius.
What did he do during WWII?
Bot: He worked on the Manhattan Project, which was a project to develop nuclear weapons.
What is the population of Michigan?
Bot: I do not know the exact number, but it is one of the most populous states in the US.
SuperGlue: benchmark on 10 NLP tasks
Training Huge Models
Model | Year | # Parameters |
Medium-sized LSTM | | 10M |
ELMo | 2018 | 90M |
GPT | 2018 | 110M |
BERT-Large | 2019 | 335M |
GPT-2 | 2018 | 1.5B |
T5 | 2019 | 11B |
GPT-3 | 2020 | 175B |
Switch Transformer | 2021 | 1.6T |
Honeybee brain | | ~1B synapses |
Required
3.11 1023 FLOPS
355 GPU-years $4.6M cost
SOTA language model increases by at least a factor of 10 every year
Language Model Sizes
Source: https://arxiv.org/pdf/1911.02116.pdf
Quiz
Digression
Data Science vs Artificial Intelligence
Difference between Data Science, ML and AI
Human in the Loop
Chris Manning:
When you take a product off the supermarket shelf, data is collected and stored into logs.
Analysis proceeds from such business process exhaust data.
With language human has some information to communicate and construct a message to convey meaning to other humans.
Deliberate form of expressing intent, facts, opinion, etc.
Text Analytics vs. Text Mining
Text Mining
Text Analytics
Role of Data
Unreasonable Effectiveness of Data
Scientific Dispute: is it science?
Prof. Noam Chomsky, Linguist, MIT
There's been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures.
Peter Norvig, Director of research, Google
Many phenomena in science are stochastic, and the simplest model of them is a probabilistic model; I believe language is such a phenomenon and therefore that probabilistic models are our best tool for representing facts about language, for algorithmically processing language, and for understanding how humans process language.
Unreasonable Effectivenss of Deep Learning in AI
Terrence J. Sejnowski
Although applications of deep learning networks to real world problems have become ubiquitous, our understanding of why they are so effective is lacking. These empirical results should not be possible according to sample complexity in statistics and non-convex optimization theory. However, paradoxes in the training and effectiveness of deep learning networks are being investigated and insights are being found in the geometry of high-dimensional spaces.
https://arxiv.org/pdf/2002.04806.pdf�
Why Bigger Neural Networks Do Better
HLT in Industry
HLT in Industry
Dependency Parsing
Apple SIRI
Google Voice Actions
Personal Assistants
Why to study human language?
AI is fascinating since it is a discipline where the mind studies itself.
Luigi Stringa, director FBK
Challenge: to teach natural language to computers
Thirty Million Words
Language and Intelligence
“Understanding cannot be measured by external behavior; it is an internal metric of how the brain remembers things and uses its memories to make predictions”.
“The difference between the intelligence of humans and other mammals is that we have language”.
Jeff Hawkins, “On Intelligence”, 2004
Hawkins’ Memory-Prediction Framework
A Current Challenge for AI
Knowledge Based Approach
Machine Learning
Feature | NER |
Current Word | |
Previous Word | |
Next Word | |
Current Word Character n-gram | all |
length | |
Current POS Tag | |
Surrounding POS Tag Sequence | |
Current Word Shape | |
Surrounding Word Shape Sequence | |
… | |
Features for finding
named entities like locations or
organization names (Finkel et al., 2010
Deep Learning
(e.g. sound, pixels, characters, or words)
Deep Learning for Speech
Acoustic model and WER | RT03S FSH | Hub5 SWB |
Traditional features | 27.4 | 23.6 |
Deep Learning | 18.5 (−33%) | 16.1 (−32%) |
Deep Learning for Computer Vision
ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky, Sutskever, & Hinton, 2012, U. Toronto. -37% error
Deep Learning for NLP
sound | continuous |
gesture | continuous |
image | continuous |
text | discrete |
Afg a ghg ahj ajk al kl akl akla kla kl akl akl akl w io sd jio e op eo po eop e oppo po e[ p[p[pe p[ p[p
Deep Learning Success
Linguistic Tasks
Easy
Medium
Hard
Linguistic Applications
NLP is Difficult
NLP is hard
Natural Language Understanding is difficult
Slide by C. Manning
Where are the ambiguities?
Slide by C. Manning
Newspaper Headlines
Coreference Resolution
U: Where is The Green Hornet playing in Mountain View?
S: The Green Hornet is playing at the Century 16 theater.
U: When is it playing there?
S: It’s playing at 2pm, 5pm, and 8pm.
U: I’d like 1 adult and 2 children for the first show.�How much would that cost?
Hidden Structure of Language
Deep Learning for NLP
Achieving the goals of NLP by using representation learning and deep learning to build end-to-end systems
Continuous Representation of Words
Word Similarities
Entails Morphology
Prefix stem suffix
un interest ed
Parsing for Sentence Structure
Question Answering
Neural Reasoner