1 of 21

CDW/Nike

LI Custom Lesson

Natural Language Processing

2 of 21

Natural Language Processing

What is NLP?
Computers and Language
Text Classification
Biased Datasets

3 of 21

What is Natural Language Processing?

Formal Languages - Very easy for computers to understand

Natural Languages - A little more complicated…

4 of 21

What is Natural Language Processing?

Helping computers “understand” human language
Involves computer science, artificial intelligence, linguistics, cognitive science

5 of 21

What does it mean to understand language?

That depends on the task…

6 of 21

What does it mean to understand language?

That depends on the task…

Speech recognition
Question answering systems
Machine translation
Summarization
And much more…

7 of 21

How can computers understand language?

Option 1: Teaching computers the rules of language

Option 2: Learning from real language data

8 of 21

Machine Learning

The computer learns (gets better at something) as it receives more data

Data

Algorithm

Predictions about unseen data

9 of 21

Classification

any software just for 15 $ - 99 $
perspective on ferc regulatory action client conf call
unlicensed installation found on your computer
rolex watches starting under $ 199 . 99
fw : final revised document . thanks .

10 of 21

Classification

SPAM any software just for 15 $ - 99 $

OK perspective on ferc regulatory action client conf call

SPAM unlicensed installation found on your computer

SPAM rolex watches starting under $ 199 . 99

OK fw : final revised document . thanks .

11 of 21

Naive Bayes Classifier

We will write a program that learns to differentiate between spam and non-spam (ham) messages
Our model will be based on word counts

12 of 21

Code Along

Code Along: Text Classification

13 of 21

Wait, what did we just do?

We taught our classifier what spam looks likes using training data

14 of 21

Problems?

What limitations could a model based on our training data have?

15 of 21

What limitations could our training data have?

What limitations could a model based on our training data have?

Words our classifier hasn’t seen before
Sometimes, looking at words separately might not be enough
The email messages an Enron employee gets could be very different from the messages that you and I get

16 of 21

What else is trained on real language data?

17 of 21

Human data → Human biases

Our biases (both our own and our society’s) are reflected in our language
When AI learns from human language, it also learns human biases

18 of 21

Human data → Human biases

19 of 21

Amazon Recruitment AI

Text classifier that sorts job applications into top candidates to review and other candidates to discard
Training Data: Previous applications accepted or rejected by human recruiters
Result: Having the word “woman” or the name of a women’s college in the application → lower rating

20 of 21

More Examples throughout AI

21 of 21

Project

Project: Text Processing