1 of 21

CDW/Nike

LI Custom Lesson

Natural Language Processing

2 of 21

Natural Language Processing

  • What is NLP?
  • Computers and Language
  • Text Classification
  • Biased Datasets

3 of 21

What is Natural Language Processing?

Formal Languages - Very easy for computers to understand

Natural Languages - A little more complicated…

4 of 21

What is Natural Language Processing?

  • Helping computers “understand” human language
  • Involves computer science, artificial intelligence, linguistics, cognitive science

5 of 21

What does it mean to understand language?

That depends on the task…

6 of 21

What does it mean to understand language?

That depends on the task…

  • Speech recognition
  • Question answering systems
  • Machine translation
  • Summarization
  • And much more…

7 of 21

How can computers understand language?

Option 1: Teaching computers the rules of language

Option 2: Learning from real language data

8 of 21

Machine Learning

  • The computer learns (gets better at something) as it receives more data

Data

Algorithm

Predictions about unseen data

9 of 21

Classification

  1. any software just for 15 $ - 99 $
  2. perspective on ferc regulatory action client conf call
  3. unlicensed installation found on your computer
  4. rolex watches starting under $ 199 . 99
  5. fw : final revised document . thanks .

10 of 21

Classification

SPAM any software just for 15 $ - 99 $

OK perspective on ferc regulatory action client conf call

SPAM unlicensed installation found on your computer

SPAM rolex watches starting under $ 199 . 99

OK fw : final revised document . thanks .

11 of 21

Naive Bayes Classifier

  • We will write a program that learns to differentiate between spam and non-spam (ham) messages
  • Our model will be based on word counts

12 of 21

Code Along

13 of 21

Wait, what did we just do?

We taught our classifier what spam looks likes using training data

14 of 21

Problems?

What limitations could a model based on our training data have?

15 of 21

What limitations could our training data have?

What limitations could a model based on our training data have?

  • Words our classifier hasn’t seen before
  • Sometimes, looking at words separately might not be enough
  • The email messages an Enron employee gets could be very different from the messages that you and I get

16 of 21

What else is trained on real language data?

17 of 21

Human data → Human biases

  • Our biases (both our own and our society’s) are reflected in our language
  • When AI learns from human language, it also learns human biases

18 of 21

Human data → Human biases

19 of 21

  • Text classifier that sorts job applications into top candidates to review and other candidates to discard
  • Training Data: Previous applications accepted or rejected by human recruiters
  • Result: Having the word “woman” or the name of a women’s college in the application → lower rating

20 of 21

More Examples throughout AI

21 of 21

Project