Course Overview
Human Language Technologies
Dipartimento di Informatica
Giuseppe Attardi
Università di Pisa
About the Course
Experimental Approach
Topics
Program
Digression
Data Science vs Artificial Intelligence
Difference between Data Science, ML and AI
Human in the Loop
Chris Manning:
When you take a product off the supermarket shelf, data is collected and stored into logs.
Analysis proceeds from such business process exhaust data.
With language human has some information to communicate and construct a message to convey meaning to other humans.
Deliberate form of expressing intent, facts, opinion, etc.
Text Analytics vs. Text Mining
Text Mining
Text Analytics
Role of Data
Unreasonable Effectiveness of Data
Scientific Dispute: is it science?
Prof. Noam Chomsky, Linguist, MIT
There's been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures.
Peter Norvig, Director of research, Google
Many phenomena in science are stochastic, and the simplest model of them is a probabilistic model; I believe language is such a phenomenon and therefore that probabilistic models are our best tool for representing facts about language, for algorithmically processing language, and for understanding how humans process language.
Unreasonable Effectivenss of Deep Learning in AI
Terrence J. Sejnowski
Although applications of deep learning networks to real world problems have become ubiquitous, our understanding of why they are so effective is lacking. These empirical results should not be possible according to sample complexity in statistics and non-convex optimization theory. However, paradoxes in the training and effectiveness of deep learning networks are being investigated and insights are being found in the geometry of high-dimensional spaces.
https://arxiv.org/pdf/2002.04806.pdf�
Why Bigger Neural Networks Do Better
HLT in Industry
HLT in Industry
Dependency Parsing
Apple SIRI
Google Voice Actions
Personal Assistants
Why to study human language?
AI is fascinating since it is a discipline where the mind studies itself.
Luigi Stringa, director FBK
Challenge: to teach natural language to computers
Thirty Million Words
Language and Intelligence
“Understanding cannot be measured by external behavior; it is an internal metric of how the brain remembers things and uses its memories to make predictions”.
“The difference between the intelligence of humans and other mammals is that we have language”.
Jeff Hawkins, “On Intelligence”, 2004
Hawkins’ Memory-Prediction Framework
A Current Challenge for AI
Knowledge Based Approach
Machine Learning
Feature | NER |
Current Word | |
Previous Word | |
Next Word | |
Current Word Character n-gram | all |
length | |
Current POS Tag | |
Surrounding POS Tag Sequence | |
Current Word Shape | |
Surrounding Word Shape Sequence | |
… | |
Features for finding
named entities like locations or
organization names (Finkel et al., 2010
Deep Learning
(e.g. sound, pixels, characters, or words)
Deep Learning for Speech
Acoustic model and WER | RT03S FSH | Hub5 SWB |
Traditional features | 27.4 | 23.6 |
Deep Learning | 18.5 (−33%) | 16.1 (−32%) |
Deep Learning for Computer Vision
ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky, Sutskever, & Hinton, 2012, U. Toronto. -37% error
Deep Learning for NLP
sound | continuous |
gesture | continuous |
image | continuous |
text | discrete |
Afg a ghg ahj ajk al kl akl akla kla kl akl akl akl w io sd jio e op eo po eop e oppo po e[ p[p[pe p[ p[p
Deep Learning Success
Linguistic Tasks
Easy
Medium
Hard
Linguistic Applications
NLP is Difficult
NLP is hard
Natural Language Understanding is difficult
Slide by C. Manning
Where are the ambiguities?
Slide by C. Manning
Newspaper Headlines
Coreference Resolution
U: Where is The Green Hornet playing in Mountain View?
S: The Green Hornet is playing at the Century 16 theater.
U: When is it playing there?
S: It’s playing at 2pm, 5pm, and 8pm.
U: I’d like 1 adult and 2 children for the first show.�How much would that cost?
Hidden Structure of Language
Deep Learning for NLP
Achieving the goals of NLP by using representation learning and deep learning to build end-to-end systems
Entails Morphology
Prefix stem suffix
un interest ed
Parsing for Sentence Structure
Question Answering
Neural Reasoner