OPINION MINING CLIMATE CHANGE TWEETS
By: Lily Lou, Lizzie Tong, Michael Yang, Haocheng Zhao
Introduction
Related Work
Pak, Alexander and Paroubek, Mucha. “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” 2010. https://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/385_Paper.pdf.
Neethu, M S, and R Rajasree. “Sentiment Analysis in Twitter Using Machine Learning Techniques.” In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 1–5, 2013. https://doi.org/10.1109/ICCCNT.2013.6726818.
Related Work
Mucha, Nithisha. “Sentiment Analysis of Global Warming Using Twitter Data,” 2018. https://library.ndsu.edu/ir/handle/10365/28166.
Reyes-Menendez, Ana, José Ramón Saura, and Cesar Alvarez-Alonso. “Understanding #WorldEnvironmentDay User Opinions in Twitter: A Topic-Based Sentiment Analysis Approach.” International Journal of Environmental Research and Public Health 15, no. 11 (November 2018): 2537. https://doi.org/10.3390/ijerph15112537.
Exploratory Topic Analysis
Word | Length | Count | Weighted Percentage (%) | Similar Words |
climate | 7 | 1553 | 6.43 | climat, climate |
changing | 8 | 1447 | 5.99 | change, changed, changes, changing |
global | 6 | 1127 | 4.67 | global, globally |
warming | 7 | 1081 | 4.48 | warm, warming, warms |
new | 3 | 135 | 0.56 | new |
Word | Length | Count | Weighted Percentage (%) | Similar Words |
snowing | 7 | 85 | 0.50 | snow, snowing, snows |
earth | 5 | 82 | 0.48 | earth |
report | 6 | 82 | 0.48 | report, reporter, reporting, reports |
energy | 6 | 77 | 0.45 | energy |
science | 7 | 76 | 0.45 | science, sciences |
scientists | 10 | 68 | 0.40 | scientist, scientists |
volcanoes | 9 | 56 | 0.33 | volcano, volcanoes, volcanos |
CLIMATE CHANGE BELIEVERS
snowing | 7 | 126 | 1.57 | snow, snowed, snowing, snows |
science | 7 | 59 | 0.74 | science |
scientists | 10 | 32 | 0.40 | scientist, scientists |
weather | 7 | 32 | 0.40 | weather |
cold | 4 | 29 | 0.36 | cold |
hoax | 4 | 28 | 0.35 | hoax |
blizzard | 8 | 27 | 0.34 | blizzard, blizzards |
global | 6 | 790 | 7.26 | global |
warming | 7 | 785 | 7.21 | warm, warmed, warming |
climate | 7 | 332 | 3.05 | climate, climatism |
change | 6 | 290 | 2.66 | change, changes, changing |
snowing | 7 | 126 | 1.16 | snow, snowed, snowing, snows |
tcot | 4 | 97 | 0.89 | tcot |
gore | 4 | 74 | 0.68 | gore |
CLIMATE CHANGE DENIERS
Predicting Sentiment Analysis
Sentiment Analysis: Approaches
Approaches: Model
Methodology: Dataset & Cleansing
Tweets with keywords “climate change” scraped from Twitter (Credit to IBM employees)
6090 in total. Columns:Tweet, existence, existence.confidence
Watson labeled each tweet with negative or positive attitude towards climate change
What a Tweet in our dataset looks like?
“RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”
“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn.(via @TonyMackGD) [link]”
A lot of info we don’t need in a tweet.
“RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”
“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn. (via @TonyMackGD) [link]”
There are something we are not certain...
“RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”
“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn. (via @TonyMackGD) [link]”
AND…. ALL the punctuations are unnecessary
“RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”
“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn. (via @TonyMackGD) [link]”
After cleaning process
“well the truth is brian we can t solve global warming because i fucking changed light bulbs in my house barack obama”
“un process in danger unless world agrees on climate change telegraph global warming is a fraud for world government”
One more Step before feeding data into model
NA in existence.
Duplicates after cleaning the data.
Watson sometimes is Paradoxical.
Majority Vote to fix the above 2 problems.
Final dataset scale: 3263
Methodology: Lightside
Evaluation of Results
Evaluation of Results
Features | Accuracy | Kappa |
Uni+Bi+tri+Length | .7852 | .5101 |
Uni+bi+tri +length+ ignore all stopwords | .7836 | .5084 |
Uni + Bi+ tri + Word/POS Pairs | .783 | .5081 |
Uni+bi+tri + ignore all stopwords | .7824 | .5062 |
Unigrams + bigrams + trigrams | .7818 | .5028 |
Unigrams + bigrams | .78 | .4978 |
Uni + line Length + Stem N-Grams | .7784 | .4888 |
Bigrams | .7775 | .4659 |
Uni + Stem N-Grams | .7772 | .4875 |
Uni + Bi + Tri + Stem + Skip + Ignore | .776 | .492 |
Features | Accuracy | Kappa |
Unigrams | .7754 | .4808 |
Uni+Bi+Tri+Word/POS Pairs+Line Length + Ignore all stopwords | .7741 | .4925 |
Uni+length+ ignore all stopwords | .7729 | .4752 |
Uni+Bi+Tri+POS Bigrams + POS Trigrams | .7723 | .3774 |
Uni+bi+tri+Count Occurences | .7705 | .481 |
Uni+Bi+Tri+Skip Stopwords | .7705 | .481 |
Trigrams | .7689 | .4109 |
Uni + POS Bigrams + POS Trigrams | .7582 | .437 |
Uni + Stem N-Grams | .7772 | .4875 |
Evaluation of Results: With Tags
Features | Accuracy | Kappa |
Uni+Bi+tri+Length | .7749 | .4864 |
Uni+bi+tri +length+ ignore all stopwords | .772 | .4816 |
Uni + Bi+ tri + Word/POS Pairs | .7742 | .4901 |
Unigrams | .7698 | .4698 |
Accuracy | Kappa |
.7852 | .5101 |
.7836 | .5084 |
.783 | .5081 |
.7754 | .4808 |
Without WITH
Evaluation of Results: Best Combination
Unigram+Bigram+trigram+Length
| Precision | Recall |
Y | .88 | .81 |
N | .62 | .73 |
Discussion of Results: Best Combination
Easy classifications:
Harder classifications:
Even a Harder one:
Discussion of Results: Length
Discussion of Results: Best Combination
Conclusion
Further research based on how views have changed over time
Questions?