1 of 17

CLIMATE-RELATED TWITTER FEEDBACK ANALYSIS Using Bert

By: Yihang Hu, Qingcheng Wei, Zinan Zhang, Chengze Xie

2 of 17

GOAL OF THIS PROJECT

Utilizing a pre-trained model, we gather Tweets related to climate change and search for a relationship between whether this Tweet will gain public support and the content of the post.

3 of 17

Why we choose this project?

5 of 17

GOAL OF THIS PROJECT

1. Facilitate user retention and interaction

2. Prone to receive positive Tweets

6 of 17

Benefit of this project

1. Facilitate user retention and interaction

Whenever users post a tweet, system will use our algorithm to identify whether the tweet will be liked or not.

Based on the result, the system will push tweets that will be liked a lot forward than others when users search for climate or environment related topic

And it will facilitate user interaction with the platform and improve the user stickiness

7 of 17

GOAL OF THIS PROJECT

2. Prone to receive positive Tweets

Popular tweets are likely to contain positive and profound environmental-related topics.

With positive tweets pushed to the top, public awareness on climate change can be raised.

8 of 17

DATA GATHERING

1. We use the <Tweepy> package to request for twitter API by providing its specific ID number

2. We clean the table by removing all the Nah values in the table, organizing into Texts and Labels which are our associated X and Y values.

3. We save our data into csv file for future references and feed it into our pre-training model

9 of 17

DATA GATHERING Example

Given: Twitter ID number: 1028957353322762240

2. Use the ID as our parameter and put into API request functions from Tweepy package

get_tweets() & get_liking_users()

3. We get the twitter texts and liking numbers associated with this ID:

Text: “An eye-opening article. This further reinforces the need to switch to a more enviroment friendly lifestyle.\n@EamonRyan thank you for sharing this!”

Label: 0

10 of 17

DISPLAY THE DATASET

11 of 17

Pre-training Model

Bert_Tokenizer

With the help of this model, we transform the text into numbers by the Lookup table

With the help of this Bert specific tokenizer, we are able to

Faster rate to remove the text ambiguaty.
Tokenize the tweets that including emojis and other Twitter norms
Split the word into multiple tokens. This will help classify words with multiple forms.

12 of 17

Why choose Bert transformer?

Bert Transformer

Non-sequential: Sentences are processed as a whole rather than word by word(Solving long dependency issues)
Self Attention: Auto compute similarity scores between words in a sentence
Positional Embeddings: introduced to replace recurrence(Providing relationship between different words ).

RNN/LSTM

Sequential Processing: sentences must be processed word by word
Markov property: each state is only dependent on the previously seen state

13 of 17

MODEL

We use a pre-trained model that has been trained on 48368 tweets.
The architecture of our model: We have 11 Bert Layers.
We have carefully chosen the hyperparameters of this model given our relatively small sample size compared to the dataset used in pre-trained process.

14 of 17

FINAL OUTPUT

The figure on the right represents the average training losses in each epochs. At last based on our experiment, we got the testing accuracy of 83.82%

15 of 17

Difficulties while fitting the model

At every request using API, we can only retrieve a limited amount of tweets, since we only have the most basic developer tool. Hence the downloading process is tremendously long.

The data tokenization part is time-consuming, since we have to convert the original tweet post, after data processing such as removal of special characters, into list of numbers at both the training and testing process.

The optimizer has to be carefully determined for NLP; due to high variability of original post, such as Emoji or languages other than English, error handling such as Try-Catch has to be implemented.

1 of 17

2 of 17

3 of 17

4 of 17

5 of 17

6 of 17

7 of 17

8 of 17

9 of 17

10 of 17

11 of 17

12 of 17

13 of 17

14 of 17

15 of 17

16 of 17

17 of 17