Sentiment Analysis
What is Sentiment Analysis?
Dan Jurafsky
Positive or negative movie review?
2
Dan Jurafsky
3
Dan Jurafsky
4
Dan Jurafsky
Twitter sentiment versus Gallup Poll of Consumer Confidence
Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In ICWSM-2010
Dan Jurafsky
Twitter sentiment:
Johan Bollen, Huina Mao, Xiaojun Zeng. 2011. Twitter mood predicts the stock market,
Journal of Computational Science 2:1, 1-8. 10.1016/j.jocs.2010.12.007.
6
Dan Jurafsky
7
Dow Jones
CALM
Bollen et al. (2011)
Dan Jurafsky
Target Sentiment on Twitter
8
Dan Jurafsky
Sentiment analysis has many other names
9
Dan Jurafsky
Why sentiment analysis?
10
Dan Jurafsky
Scherer Typology of Affective States
Dan Jurafsky
Scherer Typology of Affective States
Dan Jurafsky
Sentiment Analysis
“enduring, affectively colored beliefs, dispositions towards objects or persons”
13
Dan Jurafsky
Sentiment Analysis
Dan Jurafsky
Sentiment Analysis
Dan Jurafsky
Sentiment Analysis
What is Sentiment Analysis?
Dan Jurafsky
Sentiment Analysis
A Baseline Algorithm
Dan Jurafsky
Sentiment Classification in Movie Reviews
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. ACL, 271-278
Dan Jurafsky
IMDB data in the Pang and Lee database
when _star wars_ came out some twenty years ago , the image of traveling throughout the stars has become a commonplace image . […]
when han solo goes light speed , the stars change to bright lines , going towards the viewer in lines that converge at an invisible point .
cool .
_october sky_ offers a much simpler image–that of a single white dot , traveling horizontally across the night sky . [. . . ]
“ snake eyes ” is the most aggravating kind of movie : the kind that shows so much potential then becomes unbelievably disappointing .
it’s not just because this is a brian depalma film , and since he’s a great director and one who’s films are always greeted with at least some fanfare .
and it’s not even because this was a film starring nicolas cage and since he gives a brauvara performance , this film is hardly worth his talents .
✓
✗
Dan Jurafsky
Baseline Algorithm (adapted from Pang and Lee)
Dan Jurafsky
Sentiment Tokenization Issues
words in all caps)
21
[<>]? # optional hat/brow
[:;=8] # eyes
[\-o\*\']? # optional nose
[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
| #### reverse orientation
[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
[\-o\*\']? # optional nose
[:;=8] # eyes
[<>]? # optional hat/brow
Potts emoticons
Dan Jurafsky
Extracting Features for Sentiment Classification
vs
22
Dan Jurafsky
Negation
Add NOT_ to every word between negation and following punctuation:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
Dan Jurafsky
Reminder: Naïve Bayes
24
Dan Jurafsky
Binarized (Boolean feature) Multinomial Naïve Bayes
25
Dan Jurafsky
Boolean Multinomial Naïve Bayes: Learning
docsj ← all docs with class =cj
nk ← # of occurrences of wk in Textj
Dan Jurafsky
Boolean Multinomial Naïve Bayes� on a test document d
27
Dan Jurafsky
Normal vs. Boolean Multinomial NB
28
Normal | Doc | Words | Class |
Training | 1 | Chinese Beijing Chinese | c |
| 2 | Chinese Chinese Shanghai | c |
| 3 | Chinese Macao | c |
| 4 | Tokyo Japan Chinese | j |
Test | 5 | Chinese Chinese Chinese Tokyo Japan | ? |
Boolean | Doc | Words | Class |
Training | 1 | Chinese Beijing | c |
| 2 | Chinese Shanghai | c |
| 3 | Chinese Macao | c |
| 4 | Tokyo Japan Chinese | j |
Test | 5 | Chinese Tokyo Japan | ? |
Dan Jurafsky
Binarized (Boolean feature) �Multinomial Naïve Bayes
29
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
V. Metsis, I. Androutsopoulos, G. Paliouras. 2006. Spam Filtering with Naive Bayes – Which Naive Bayes? CEAS 2006 - Third Conference on Email and Anti-Spam.
K.-M. Schneider. 2004. On word frequency information and negative evidence in Naive Bayes text classification. ICANLP, 474-485.
JD Rennie, L Shih, J Teevan. 2003. Tackling the poor assumptions of naive bayes text classifiers. ICML 2003
Dan Jurafsky
Cross-Validation
Dan Jurafsky
Other issues in Classification
31
Dan Jurafsky
Problems: �What makes reviews hard to classify?
32
Dan Jurafsky
Thwarted Expectations�and Ordering Effects
33
Dan Jurafsky
Sentiment Analysis
A Baseline Algorithm
Dan Jurafsky
Sentiment Analysis
Sentiment Lexicons
Dan Jurafsky
The General Inquirer
Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT Press
Dan Jurafsky
LIWC (Linguistic Inquiry and Word Count)
Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC 2007. Austin, TX
Dan Jurafsky
MPQA Subjectivity Cues Lexicon
38
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in
Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
Dan Jurafsky
Bing Liu Opinion Lexicon
39
Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004.
Dan Jurafsky
SentiWordNet
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010 SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC-2010
Pos 0 Neg 0 Obj 1
Pos .75 Neg 0 Obj .25
Dan Jurafsky
Disagreements between polarity lexicons
41
| Opinion Lexicon | General Inquirer | SentiWordNet | LIWC |
MPQA | 33/5402 (0.6%) | 49/2867 (2%) | 1127/4214 (27%) | 12/363 (3%) |
Opinion Lexicon | | 32/2411 (1%) | 1004/3994 (25%) | 9/403 (2%) |
General Inquirer | | | 520/2306 (23%) | 1/204 (0.5%) |
SentiWordNet | | | | 174/694 (25%) |
LIWC | | | | |
Christopher Potts, Sentiment Tutorial, 2011
Dan Jurafsky
Analyzing the polarity of each word in IMDB
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Dan Jurafsky
Analyzing the polarity of each word in IMDB
Scaled likelihood
P(w|c)/P(w)
Scaled likelihood
P(w|c)/P(w)
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Dan Jurafsky
Other sentiment feature: Logical negation
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Dan Jurafsky
Potts 2011 Results:�More negation in negative sentiment
a
Scaled likelihood
P(w|c)/P(w)
Dan Jurafsky
Sentiment Analysis
Sentiment Lexicons
Dan Jurafsky
Sentiment Analysis
Learning Sentiment Lexicons
Dan Jurafsky
Semi-supervised learning of lexicons
48
Dan Jurafsky
Hatzivassiloglou and McKeown intuition for identifying word polarity
49
Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the Semantic Orientation of Adjectives. ACL, 174–181
Dan Jurafsky
Hatzivassiloglou & McKeown 1997�Step 1
50
Dan Jurafsky
Hatzivassiloglou & McKeown 1997�Step 2
51
nice, helpful
nice, classy
Dan Jurafsky
Hatzivassiloglou & McKeown 1997�Step 3
52
classy
nice
helpful
fair
brutal
irrational
corrupt
Dan Jurafsky
Hatzivassiloglou & McKeown 1997�Step 4
53
classy
nice
helpful
fair
brutal
irrational
corrupt
+
-
Dan Jurafsky
Output polarity lexicon
54
Dan Jurafsky
Output polarity lexicon
55
Dan Jurafsky
Turney Algorithm
56
Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
Dan Jurafsky
Extract two-word phrases with adjectives
57
First Word | Second Word | Third Word (not extracted) |
JJ | NN or NNS | anything |
RB, RBR, RBS | JJ | Not NN nor NNS |
JJ | JJ | Not NN or NNS |
NN or NNS | JJ | Nor NN nor NNS |
RB, RBR, or RBS | VB, VBD, VBN, VBG | anything |
Dan Jurafsky
How to measure polarity of a phrase?
58
Dan Jurafsky
Pointwise Mutual Information
Dan Jurafsky
Pointwise Mutual Information
Dan Jurafsky
How to Estimate Pointwise Mutual Information
Dan Jurafsky
Does phrase appear more with “poor” or “excellent”?
62
Dan Jurafsky
Phrases from a thumbs-up review
63
Phrase | POS tags | Polarity |
online service | JJ NN | 2.8 |
online experience | JJ NN | 2.3 |
direct deposit | JJ NN | 1.3 |
local branch | JJ NN | 0.42 |
… | | |
low fees | JJ NNS | 0.33 |
true service | JJ NN | -0.73 |
other bank | JJ NN | -0.85 |
inconveniently located | JJ NN | -1.5 |
Average | | 0.32 |
Dan Jurafsky
Phrases from a thumbs-down review
64
Phrase | POS tags | Polarity |
direct deposits | JJ NNS | 5.8 |
online web | JJ NN | 1.9 |
very handy | RB JJ | 1.4 |
… | | |
virtual monopoly | JJ NN | -2.0 |
lesser evil | RBR JJ | -2.3 |
other problems | JJ NNS | -2.8 |
low funds | JJ NNS | -6.8 |
unethical practices | JJ NNS | -8.5 |
Average | | -1.2 |
Dan Jurafsky
Results of Turney algorithm
65
Dan Jurafsky
Using WordNet to learn polarity
66
S.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions. COLING 2004
M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004
Dan Jurafsky
Summary on Learning Lexicons
Dan Jurafsky
Sentiment Analysis
Learning Sentiment Lexicons
Dan Jurafsky
Sentiment Analysis
Other Sentiment Tasks
Dan Jurafsky
Finding sentiment of a sentence
70
Dan Jurafsky
Finding aspect/attribute/target of sentiment
Casino | casino, buffet, pool, resort, beds |
Children’s Barber | haircut, job, experience, kids |
Greek Restaurant | food, wine, service, appetizer, lamb |
Department Store | selection, department, sales, shop, clothing |
M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of KDD.
S. Blair-Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar. 2008. Building a Sentiment Summarizer for Local Service Reviews. WWW Workshop.
Dan Jurafsky
Finding aspect/attribute/target of sentiment
72
Dan Jurafsky
Putting it all together:�Finding sentiment for aspects
73
Reviews
Final
Summary
Sentences
& Phrases
Sentences
& Phrases
Sentences
& Phrases
Text
Extractor
Sentiment
Classifier
Aspect
Extractor
Aggregator
S. Blair-Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar. 2008. Building a Sentiment Summarizer for Local Service Reviews. WWW Workshop
Dan Jurafsky
Results of Blair-Goldensohn et al. method
Rooms (3/5 stars, 41 comments)
(+) The room was clean and everything worked fine – even the water pressure ...
(+) We went because of the free room and was pleasantly pleased ...
(-) …the worst hotel I had ever stayed at ...
Service (3/5 stars, 31 comments)
(+) Upon checking out another couple was checking early due to a problem ...
(+) Every single hotel staff member treated us great and answered every ...
(-) The food is cold and the service gives new meaning to SLOW.
Dining (3/5 stars, 18 comments)
(+) our favorite place to stay in biloxi.the food is great also the service ...
(+) Offer of free buffet for joining the Play
Dan Jurafsky
Baseline methods assume classes have equal frequencies!
75
Dan Jurafsky
How to deal with 7 stars?
76
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL, 115–124
Dan Jurafsky
Summary on Sentiment
Dan Jurafsky
Scherer Typology of Affective States
Dan Jurafsky
Computational work on other affective states
Dan Jurafsky
Detection of Friendliness
80
Ranganath, Jurafsky, McFarland
Dan Jurafsky
Sentiment Analysis
Other Sentiment Tasks
Dan Jurafsky