NLP and Sentiment Analysis with Watson
Thomas Allen, Qian Wang, Jonathan Tizon, Yan Ren
History
Some Uses of NLP
Semantics
Lexical Analysis or Tokenization
A person or natural language speaker
A computer would simply see a string object.
White Space Tokenization
The words in this string can easily be tokenized by splitting it by whitespaces
Difficulties in Tokenization
Syntactic Analysis
Categorize segments through context free grammars that define Natural Language
Basic grammar defining english
Parse Tree
Pragmatics
Pragmatic Analysis
We always pragmatically say more than we semantically say
Eliminates ambiguity by explaining meaning not found in plain text/semantics
Sentiment Analysis
We want smarter algorithms!
The simplest example
“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”
Now let’s assume...
“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”
Good: 46
-Goodness = 46/(46+22) = 0.68
Bad: 22
-Badness = 22/(46+22) = 0.32
Data from Sentdex
Other polarized words ex:
“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”
Good: 46
-Goodness = 46/(46+22) = 0.68
Bad: 22
-Badness = 22/(46+22) = 0.32
Good: 15
-Goodness = 15/(6+15) = 0.71
Bad: 6
-Badness = 6/(6+15) = 0.29
Good: 10
-Goodness = 10/(14+10) = 0.41
Bad: 14
-Badness = 14/(14+10) = 0.59
Data from Sentdex
What about neutral words?
“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”
Good: 46
-Goodness = 46/(46+22) = 0.68
Bad: 22
-Badness = 22/(46+22) = 0.32
Good: 15
-Goodness = 15/(6+15) = 0.71
Bad: 6
-Badness = 6/(6+15) = 0.29
Good: 10
-Goodness = 10/(14+10) = 0.41
Bad: 14
-Badness = 14/(14+10) = 0.59
Good: 506
-Goodness = 506/(506+507) = 0.49
Bad: 507
-Badness = 507/(506+507) = 0.51
Data from Sentdex
Words | Good | Bad | Goodness | Badness |
it’s | 506 | 507 | 0.5 | 0.5 |
rather | 43 | 63 | 0.4 | 0.6 |
like | 242 | 396 | 0.61 | 0.39 |
a | 3346 | 3112 | 0.53 | 0.47 |
lifetime | 3 | 5 | 0.38 | 0.62 |
special | 29 | 40 | 0.42 | 0.58 |
pleasant | 15 | 6 | 0.71 | 0.29 |
sweet | 46 | 22 | 0.68 | 0.32 |
and | 3198 | 2371 | 0.57 | 0.43 |
forgettable | 10 | 14 | 0.42 | 0.58 |
Goodness:
Badness:
Bag of Words model
Introduction to Bayes’ Algorithm
P(A|B) - Conditional probability of event A occurring given event B is true.
P(B|A) - Conditional probability of event B occurring given event A is true.
P(A) and P(B) - Probabilities of event A and event B occurring respectively
Naive Bayes Algorithm
N-grams
Data from Sentdex
NLP Library
Node.js
Python
Java
NLP Services
Watson NLP APPs
Concept
Category
Emotion
Entity
Keywords
Metadata
Relations
Semantic Roles
Sentiment
Set Up IBM Cloud
pip install --upgrade watson-developer-cloud
Thank you
Questions?