1 of 28

OPINION MINING CLIMATE CHANGE TWEETS

By: Lily Lou, Lizzie Tong, Michael Yang, Haocheng Zhao

2 of 28

Introduction

  • 1.5 degrees Celsius between 2030 and 2052 if greenhouse gas emissions (GHG) continue at current rate
    • More intense weather, ecosystems destroyed, crises over scarcity over water and natural resources, migration due to climate change
  • Predict viewpoints of climate change Tweets and through exploratory analysis, find common topics when climate change is mentioned

3 of 28

Related Work

Pak, Alexander and Paroubek, Mucha. “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” 2010. https://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/385_Paper.pdf.

Neethu, M S, and R Rajasree. “Sentiment Analysis in Twitter Using Machine Learning Techniques.” In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 1–5, 2013. https://doi.org/10.1109/ICCCNT.2013.6726818.

4 of 28

Related Work

Mucha, Nithisha. “Sentiment Analysis of Global Warming Using Twitter Data,” 2018. https://library.ndsu.edu/ir/handle/10365/28166.

Reyes-Menendez, Ana, José Ramón Saura, and Cesar Alvarez-Alonso. “Understanding #WorldEnvironmentDay User Opinions in Twitter: A Topic-Based Sentiment Analysis Approach.” International Journal of Environmental Research and Public Health 15, no. 11 (November 2018): 2537. https://doi.org/10.3390/ijerph15112537.

5 of 28

Exploratory Topic Analysis

  • Nvivo Pro 12 - Word Frequency

  • Greater than 50 instances

  • Including stem-words/ synonyms to proxy topics/arguments across tweets

6 of 28

Word

Length

Count

Weighted Percentage (%)

Similar Words

climate

7

1553

6.43

climat, climate

changing

8

1447

5.99

change, changed, changes, changing

global

6

1127

4.67

global, globally

warming

7

1081

4.48

warm, warming, warms

new

3

135

0.56

new

Word

Length

Count

Weighted Percentage (%)

Similar Words

snowing

7

85

0.50

snow, snowing, snows

earth

5

82

0.48

earth

report

6

82

0.48

report, reporter, reporting, reports

energy

6

77

0.45

energy

science

7

76

0.45

science, sciences

scientists

10

68

0.40

scientist, scientists

volcanoes

9

56

0.33

volcano, volcanoes, volcanos

CLIMATE CHANGE BELIEVERS

7 of 28

snowing

7

126

1.57

snow, snowed, snowing, snows

science

7

59

0.74

science

scientists

10

32

0.40

scientist, scientists

weather

7

32

0.40

weather

cold

4

29

0.36

cold

hoax

4

28

0.35

hoax

blizzard

8

27

0.34

blizzard, blizzards

global

6

790

7.26

global

warming

7

785

7.21

warm, warmed, warming

climate

7

332

3.05

climate, climatism

change

6

290

2.66

change, changes, changing

snowing

7

126

1.16

snow, snowed, snowing, snows

tcot

4

97

0.89

tcot

gore

4

74

0.68

gore

CLIMATE CHANGE DENIERS

8 of 28

Predicting Sentiment Analysis

  • What patterns exist within features that indicate whether or not the author believes in climate change?
  • What features are particularly accurate?

  • Looked at different combinations of features:
    • Unigram, bigram, trigram, count occurences, stemming, line length
  • Compare Results!

9 of 28

Sentiment Analysis: Approaches

10 of 28

Approaches: Model

  • Built Naive Bayes model
    • Previous literature indicated that Naive Bayes was superior to other methods such as Support Vector Machine (SVM), Conditional Random Fields, Logistic Regression, etc.
  • Given smaller number of instances, evaluated using Cross Validation

11 of 28

Methodology: Dataset & Cleansing

Tweets with keywords “climate change” scraped from Twitter (Credit to IBM employees)

6090 in total. Columns:Tweet, existence, existence.confidence

Watson labeled each tweet with negative or positive attitude towards climate change

12 of 28

What a Tweet in our dataset looks like?

“RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”

“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn.(via @TonyMackGD) [link]”

13 of 28

A lot of info we don’t need in a tweet.

RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”

“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn. (via @TonyMackGD) [link]

14 of 28

There are something we are not certain...

RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”

“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn. (via @TonyMackGD) [link]

15 of 28

AND…. ALL the punctuations are unnecessary

RT @mmfa: Brain Freeze: Conservative media still using winter weather to attack global warming: http://bit.ly/9nKEcc #p2 #noisemachine #climate”

“"D.C. Snowstorm: How Global Warming Makes Blizzards Worse - TIME" ( http://bit.ly/9sMGXj ) It's snowing because it's so warm? Cahmaaahn. (via @TonyMackGD) [link]

16 of 28

17 of 28

After cleaning process

“well the truth is brian we can t solve global warming because i fucking changed light bulbs in my house barack obama”

“un process in danger unless world agrees on climate change telegraph global warming is a fraud for world government”

18 of 28

One more Step before feeding data into model

NA in existence.

Duplicates after cleaning the data.

Watson sometimes is Paradoxical.

Majority Vote to fix the above 2 problems.

Final dataset scale: 3263

19 of 28

Methodology: Lightside

  • Once we have cleansed data, we used lightside to extract features, build and evaluate models
    • Data contained two columns: text/review and sentiment (Y/N)
  • Very straightforward, outputted results in to excel to further inspect
    • Looked at false negatives, false positives

20 of 28

Evaluation of Results

  • We looked at accuracy as the primary metric for evaluating features
    • Keeping in mind that 70% of instances are positive (belief in climate change)
      • Look at kappa as well
    • it doesn’t make sense to care too much about false positive or false negatives in this context
      • Precision and recall relegated

21 of 28

Evaluation of Results

  • Used threshold of 3, original cleaned dataset

Features

Accuracy

Kappa

Uni+Bi+tri+Length

.7852

.5101

Uni+bi+tri +length+ ignore all stopwords

.7836

.5084

Uni + Bi+ tri + Word/POS Pairs

.783

.5081

Uni+bi+tri + ignore all stopwords

.7824

.5062

Unigrams + bigrams + trigrams

.7818

.5028

Unigrams + bigrams

.78

.4978

Uni + line Length + Stem N-Grams

.7784

.4888

Bigrams

.7775

.4659

Uni + Stem N-Grams

.7772

.4875

Uni + Bi + Tri + Stem + Skip + Ignore

.776

.492

Features

Accuracy

Kappa

Unigrams

.7754

.4808

Uni+Bi+Tri+Word/POS Pairs+Line Length + Ignore all stopwords

.7741

.4925

Uni+length+ ignore all stopwords

.7729

.4752

Uni+Bi+Tri+POS Bigrams + POS Trigrams

.7723

.3774

Uni+bi+tri+Count Occurences

.7705

.481

Uni+Bi+Tri+Skip Stopwords

.7705

.481

Trigrams

.7689

.4109

Uni + POS Bigrams + POS Trigrams

.7582

.437

Uni + Stem N-Grams

.7772

.4875

22 of 28

Evaluation of Results: With Tags

  • Used threshold of 3, original cleaned dataset

Features

Accuracy

Kappa

Uni+Bi+tri+Length

.7749

.4864

Uni+bi+tri +length+ ignore all stopwords

.772

.4816

Uni + Bi+ tri + Word/POS Pairs

.7742

.4901

Unigrams

.7698

.4698

Accuracy

Kappa

.7852

.5101

.7836

.5084

.783

.5081

.7754

.4808

Without WITH

23 of 28

Evaluation of Results: Best Combination

Unigram+Bigram+trigram+Length

Precision

Recall

Y

.88

.81

N

.62

.73

24 of 28

Discussion of Results: Best Combination

Easy classifications:

  • “yeah about that global warming footprint al you hypocritical jackass”
  • “some global warming huh you were all lied to by al gore he duped you”

Harder classifications:

  • “wsj editorial today argues for fighting real natural disasters like tsunamis rather than man made ones like global warming” - false negative
  • “what global warming #3wordslibshate” - false positive
  • “tulips in mid april in chicago love the global warming” - false negative

Even a Harder one:

  • “vry inrstg dc ok already we believe in climate change snow conts 2 pile up what s the reason” -- ?TRUE POSITIVE

25 of 28

Discussion of Results: Length

  • Shorter tweets introduce more variance with features, lowering accuracy
    • “tulips in mid april in chicago love the global warming” - false negative
    • “what global warming #3wordslibshate” - false positive
  • Longer tweets are a little bit easier:
    • “larry brilliant at tedxvolcano we have to fight for science the single most important thing we face is climate change socmedia environment”

26 of 28

Discussion of Results: Best Combination

  • Informational pieces tended to be misclassified as false negatives at higher rates
    • “ultimate ebook store shocking information about global warming”
    • “wsj editorial today argues for fighting real natural disasters like tsunamis rather than man made ones like global warming”
    • “world famous places endangered by global warming if climatologists predictions of global warming are right some”
    • “utah house passes resolution implying climate change conspiracy solveclimate com”
  • Tough to tell what people are actually saying about climate change (e.g. sarcasm)
    • “the only climate change that exists is summer fall winter spring just saying”

27 of 28

Conclusion

  1. Certain topics/slang are commonly found in certain classes, seems to be corroborated by our topic analysis
    1. Al Gore is a common topic amongst non-believers
  2. Short reviews increase variability, decrease accuracy
  3. Interestingly, unigrams were outperformed by a combination of trigrams, bigrams, and unigrams
    • Context of word definitely matters more in this domain, fewer unigrams that had high information

Further research based on how views have changed over time

28 of 28

Questions?