1 of 29

NLP and Sentiment Analysis with Watson

Thomas Allen, Qian Wang, Jonathan Tizon, Yan Ren

2 of 29

History

Definition: Interaction between human (natural) language and computer (artificial) language
Components

Semantic
Syntax
Context

Timeline

Machine Translation (1950’s)
Artificial Intelligence (ELIZA)

http://www.manifestation.com/neurotoys/eliza.php3

Statistical NLP

3 of 29

Some Uses of NLP

Sentiment analysis
Machine translation
Text classification
Text summarization
Artificial intelligence

4 of 29

Semantics

5 of 29

Lexical Analysis or Tokenization

A person or natural language speaker

Naturally separate a words by spaces or punctuation.
Can easily identify linguistic units

A computer would simply see a string object.

Will need to have linguistic units defined such as words, punctuation, alpa-numerals etc.
Most have delimiters defined for segmentation.

6 of 29

White Space Tokenization

The words in this string can easily be tokenized by splitting it by whitespaces

“The quick brown fox jumps over the lazy dog.”
[“The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog.”]

7 of 29

Difficulties in Tokenization

Dirty Text

Grammatical/Syntactic errors in original document

Special characters, Abbreviations, Contractions

“He’s” could mean “he is” or “he has”
CAT could mean “cat” or “ct scan”
“60 second long videos” or “60-second long videos”

8 of 29

Syntactic Analysis

Categorize segments through context free grammars that define Natural Language

Basic grammar defining english

[Sentence] -> [Noun Phrase] [Verb Phrase]
[Noun phrase] -> [Article]|[Adjective] [Noun]
[Verb phrase] -> [Verb][Noun Phrase]|[Prepositional Phrases]
[Verb phrsae] -> [Verb][Noun Phrase][Prepositional Phrases]
[Verb Phase] -> [Verb][Noun Phrase][Adverb]
[Prepositional Phrase] -> [Preposition][Noun Phrase]

9 of 29

Parse Tree

A dictionary of definitions isn’t enough to determine the definition of a word.
Location in tree will help determine its part of speech
Part of speech helps determine word’s meaning
Buffalo from Buffalo that other buffalo from Buffalo bully [themselves] bully buffalo from Buffalo.

10 of 29

Pragmatics

11 of 29

Pragmatic Analysis

We always pragmatically say more than we semantically say

Eliminates ambiguity by explaining meaning not found in plain text/semantics

Done through contextual information
Applies real world knowledge

12 of 29

Sentiment Analysis

13 of 29

We want smarter algorithms!

Sentiment Analysis accuracy is broken down into two categories

Polarity: Was the text positive or negative (Bag of Words)
Degree: How positive or negative the text is (Naive Bayes Classification)

An intelligent approach to analyze human emotion
Allows the extraction of key entities and semantics

14 of 29

The simplest example

“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”

Sentiment analysis is not about understanding the sentence in any meaningful way

Think of pattern analysis of characters without any external influences or intuition

How should we leverage data to classify this sentence?

Most obvious, find similar sentences in data set
Simply the presence of good words, such as sweet and pleasant, can be a good indicator

15 of 29

Now let’s assume...

“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”

Good: 46

-Goodness = 46/(46+22) = 0.68

Bad: 22

-Badness = 22/(46+22) = 0.32

Data from Sentdex

16 of 29

Other polarized words ex:

“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”

Good: 46

-Goodness = 46/(46+22) = 0.68

Bad: 22

-Badness = 22/(46+22) = 0.32

Good: 15

-Goodness = 15/(6+15) = 0.71

Bad: 6

-Badness = 6/(6+15) = 0.29

Good: 10

-Goodness = 10/(14+10) = 0.41

Bad: 14

-Badness = 14/(14+10) = 0.59

Data from Sentdex

17 of 29

What about neutral words?

“It’s rather like a lifetime special -- pleasant, sweet, and forgettable.”

Good: 46

-Goodness = 46/(46+22) = 0.68

Bad: 22

-Badness = 22/(46+22) = 0.32

Good: 15

-Goodness = 15/(6+15) = 0.71

Bad: 6

-Badness = 6/(6+15) = 0.29

Good: 10

-Goodness = 10/(14+10) = 0.41

Bad: 14

-Badness = 14/(14+10) = 0.59

Good: 506

-Goodness = 506/(506+507) = 0.49

Bad: 507

-Badness = 507/(506+507) = 0.51

Data from Sentdex

18 of 29

Words	Good	Bad	Goodness	Badness
it’s	506	507	0.5	0.5
rather	43	63	0.4	0.6
like	242	396	0.61	0.39
a	3346	3112	0.53	0.47
lifetime	3	5	0.38	0.62
special	29	40	0.42	0.58
pleasant	15	6	0.71	0.29
sweet	46	22	0.68	0.32
and	3198	2371	0.57	0.43
forgettable	10	14	0.42	0.58

Goodness:

5.22

Badness:

4.8

19 of 29

Bag of Words model

Sentences are classified based on the number of positive and negative words contained.

The test sentence would be classified as positive

Note that this approach does not make use of the sentence’s order

Polarity: Was the text positive or negative
Thus coined the name “Bag of Words”

Example:

“It’s rather like a lifetime special -- pleasant, not sweet, and forgettable.”

Sweet would still contribute .68 positivity
The algorithm would still operate around 80% accuracy without actually understanding the sentence

20 of 29

Introduction to Bayes’ Algorithm

An extension of the Bayes algorithm

Assumes each data point is independent from others
This allows for infinite properties to be taken into consideration

Theorem -

States that the Probability of event B given A is equal to the probability of the event A given B multiplied by the probability of B.

Equation:

P(A|B) - Conditional probability of event A occurring given event B is true.

P(B|A) - Conditional probability of event B occurring given event A is true.

P(A) and P(B) - Probabilities of event A and event B occurring respectively

21 of 29

Naive Bayes Algorithm

The algorithm calculates the the conditional probability of an object with i feature vectors
Machine learning classification problems:

Classes (C_I) - Occurrences that are attempting to be classified through machine learning
Feature Vectors (x_i) - Base features, of syntax, that are independent from each other’s influence.

Often paired with n-grams to increase accuracy

22 of 29

N-grams

A contiguous sequence of n items from a given data sample.
Independent assumptions are made on each word based on the last n-1 words in the series.

Pro: Groups words that are unknown to the language interpreter to better understand the open nature of languages.
Cons: Without smoothing imbalances can occur between infrequent grams (proper names), frequent grams (common words), and non-present grams (not seen in the data set)

Top five 3-grams, based on the largest publicly-available, genre-balanced corpus of English.

Much the same
Much more likely
Much better than
Much more difficult
Much of the

Data from Sentdex

23 of 29

NLP Library

Node.js

Twitter-text
Knwl.js
Retext
Natural

Python

TextBlob
SpaCy
Gensim
ScatterText
AllenNLP
PyNLPl

Java

Stanford NLP
OpenNLP
ClearNLP

24 of 29

NLP Services

25 of 29

Watson NLP APPs

26 of 29

Concept

27 of 29

Identify tones at the sentence and document level
This insight can be used to refine and improve communications
Emotion

Sad, Disgust, Fear, Joy

Social Propensities

Openness, Conscientiousness, Agreeableness

Language Styles

Analytical, Confident, tentative

Classification of text input
Train your own classifier with specific training data

28 of 29

Set Up IBM Cloud

Sign in or Sign Up for https://www.ibm.com/watson/.
Menu → Developer → Cloud → My Cloud Console
Create Service
Add Credentials

pip install --upgrade watson-developer-cloud

29 of 29

Thank you

Questions?