The Human Factor
5 November 2015
Language Intelligence
Jana Thompson, NLP Engineer
Machines versus Humans
2
Human cost goes up, machine cost goes down
Source: John C. McCallum, Wikipedia, Federal Reserve Bank of St Louis. Inflation adjusted to 2011 dollars.
Processing language data
Every machine learning pipeline, ever
4
Adaptive Learning
Microtasking
Deep Learning
Active Learning
Positive about Ford
Also positive about Ford…
Will the real Ford car please stand up?
7
Data beats algorithms; feedback beats data
Results on distinguishing the correct ‘Ford’
Distinguishing “Ford” the company from people called “Ford”
Adaptive System�
9
Human Annotation
Machine Learning
Optimization
Prediction Engine
Positive, about Ford cars…but relevant?
Idibon’s analytics for car sentiment correlates with actual sales
95% accuracy in identifying people talking about buying cars on social media
12
Adaptive System�
13
Human Annotation
Machine Learning
Optimization
Prediction Engine
Annotators aren’t infallible
14
von Ahn, Luis. 2006. Games With a Purpose. https://www.cs.cmu.edu/~biglou/ieee-gwap.pdf
Reliable data => a better model
15
When does the analyst know when to stop?
16
People are always going to be central
17
Machines cluster; humans label
18
Good
Co-workers
3,845
Pay and Opportunities for Advancement
2,042
Management
490
Benefits
657
Machines sort; humans are multilingual
19
umukobwa sexuels
medicaments
épidémie
commuement
protegér
ebola
prevention sida
aladie ici
kumenya lyce
droit kwiga
concerne inyigisho
UNICEF utilizes Idibon to process millions of SMS in 12 African languages.
SENDER INTENT
CATEGORIZATION
LANGUAGE DETECTION
LOCATION
In conclusion…