1 of 32

Learn Natural Language Processing the Hard Way*

  • *A total nod to Zed Shaw, who wrote the excellent Learn Python the Hard Way

2 of 32

What People Want

3 of 32

What We Have

4 of 32

A few examples of why APIs are bad…

Actual translation to German: Mir ist heiß.

5 of 32

Sentiment Analyzers

“I’d like to change my address”

Should NOT be a Negative statement

6 of 32

Don’t get mad

7 of 32

Learn your toolbox

8 of 32

Define your parameters

  • What question do I need to answer?
  • What are the metrics I need to have to know when I’m done?

9 of 32

80% of the work: data

10 of 32

For playing some good options

11 of 32

Enterprise use cases, on the other hand…

  • Marketing
  • Customer support
  • Legal Compliance
  • Customer satisfaction

12 of 32

Labeling

Is it blue?

Is it purple?

Is it Blue Iris?

When does it matter?

13 of 32

Taxonomy

14 of 32

Taxonomy

Three categories – PayPal, Venmo, Credit Card

One category – Payment Method

15 of 32

Good Labels for Machines

  • Very Positive
  • Positive
  • Neutral
  • Negative
  • Very Negative

16 of 32

Good Labels for People

  • Positive
  • Neutral
  • Negative

17 of 32

Good Labels for People

  • Emphatic
  • Non-emphatic

18 of 32

Clustering

19 of 32

Annotation

20 of 32

Who should annotate what

  • Trendy
  • Conservative
  • Casual
  • Evening wear
  • Business wear

21 of 32

Who should annotate what

22 of 32

When to Stop

23 of 32

Annotator Quality

24 of 32

Category Quality�

25 of 32

Aggregation

  • Blue
  • Purple
  • Yellow
  • Blue
  • Purple
  • Yellow

26 of 32

Feature Engineering

  • 1. This is the rat that ate the malt.
  • 2. This is the cat that killed the rat.

This

Is

the

rat

cat

that

ate

killed

malt

Doc1

1

1

1

1

0

1

1

0

1

Doc2

1

1

1

1

1

1

0

1

0

27 of 32

Feature Engineering

  • Word2vec vs glove

28 of 32

Algorithms 1

  • Naive Bayes
  • Logistic Regression
  • MaxEnt

29 of 32

Deep learning and Keras

30 of 32

Machine Learning Libraries

31 of 32

NLTK vs. Spacy�

http://blog.thedataincubator.com/2016/04/nltk-vs-spacy-natural-language-processing-in-python/

32 of 32

Skip-thoughts

  • Limited annotation, use sentences and similarity to extend annotation.

https://github.com/ryankiros/skip-thoughts