Learn Natural Language Processing the Hard Way*
What People Want
What We Have
A few examples of why APIs are bad…
Actual translation to German: Mir ist heiß.
Sentiment Analyzers
“I’d like to change my address”
Should NOT be a Negative statement
Don’t get mad
Learn your toolbox
Define your parameters
80% of the work: data
For playing some good options
Enterprise use cases, on the other hand…
Labeling
Is it blue?
Is it purple?
Is it Blue Iris?
When does it matter?
Taxonomy
Taxonomy
Three categories – PayPal, Venmo, Credit Card
One category – Payment Method
Good Labels for Machines
Good Labels for People
Good Labels for People
Clustering
Annotation
Who should annotate what
Who should annotate what
When to Stop
Annotator Quality
Category Quality�
Aggregation
Feature Engineering
| This | Is | the | rat | cat | that | ate | killed | malt |
Doc1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
Doc2 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
Feature Engineering
Algorithms 1
Deep learning and Keras
Machine Learning Libraries
NLTK vs. Spacy�
http://blog.thedataincubator.com/2016/04/nltk-vs-spacy-natural-language-processing-in-python/
Skip-thoughts
https://github.com/ryankiros/skip-thoughts