1 of 34

Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM

Yue Lu1, Junhao Zhou1, Hong-Ning Dai1, Hao Wang2, Hong Xiao3

1 Macao University of Science and Technology

2 Norwegian University of Science and Technology

3 Guangdong University of Technology

1

2 of 34

Outline

  • Motivation

  • The Proposed Method

  • Experiments

  • Conclusion

2

3 of 34

Outline

  • Motivation

  • The Proposed Method

  • Experiments

  • Conclusion

3

4 of 34

Sentiment Analysis

  • To identify the sentiment orientation of microblog texts

Microblog

Texts

Positive

Negative

Sentiment Analysis

Model

4

5 of 34

Related Works - Word Representation

Feature engineering

• Hand-crafted features

• Sentiment lexicons

Problem

  • Time and effort consuming
  • Weibo has special words to express sentiments which are daily updated and hard to collect

0.641 excited

0.531 satisfied

0.375 cool

0.266 thought

0.063 make

-0.266 sadly

-0.531 unhappy

-0.719 annoying

5

6 of 34

Can these features be manually designed?

E1. 为祖国疯狂打call

( Cheer for my country! )

E2. 陈独秀请你坐下

( Your idea was quite brilliant! )

Traditional feature engineering cannot encode semantic features automatically

6

7 of 34

Related Works - Document Representation

Sentiment Analysis

  • Machine learning model
  • Non-RNN based deep learning model

Problem

  • Overlook the effect of long-range dependencies in Chinese words (context features) on sentiment analysis task

0.719

nice

0.094

day

0.000

a

0.000

have

7

8 of 34

Can the sentiment orientation of the middle two sentences be correctly identified without referring to the context?

E3. 为什么要这么苛刻呢?8 分钟展现出这么多中国元素,中国科技,展现出中国的热情和自信。张艺谋导演真的是鞠躬尽瘁了。搞不懂这些人!

( Why are they so mean? This 8-minute show exhibited so many Chinese elements, Chinese technologies as well as our people's enthusiasm and confidence. The director Zhang Yimou has already tried his best. I really can't understand these people! )

Traditional Non-RNN based methods cannot deal with

long-range dependencies of words (contextual feature)

Positive×

Non-RNN Based Model

8

9 of 34

Challenges for sentiment analysis of Chinese Microblog

  • Traditional feature engineering cannot automatically encode semantic features

  • Traditional non-RNN based methods cannot deal with long-range dependencies of words (contextual features)

9

10 of 34

Outline

  • Motivation

  • The Proposed Method

  • Experiments

  • Conclusion

10

11 of 34

Overview of Methodology

semantic features

contextual features

11

12 of 34

Overview of Methodology

semantic features

12

13 of 34

Continuous Bag-of-Words (CBOW)

  • A neural network based word representation(embedding) model
  • Proposed by Mikolov et al. in 2013
  • Aims to automatically encode semantic features according to the logic of natural language
  • The distance between two word vectors depends on their
    • semantic similarity
    • contextual similarity

Beijing

Olympics

remarkable

excellent

brilliant

perfect

100-dimensional column matrix

...

100

elements

13

14 of 34

Overview of Methodology

contextual features

14

15 of 34

Recurrent neural network (RNN)

Problem:

RNN stores all the information from previous inputs without filtering out some useless information. (Cannot handle long-range dependencies of words)

15

inputs

outputs

16 of 34

Long Short Term Memory (LSTM) [1]

Memory Unit of LSTM [2]

forget useless info

memorize useful info

[1] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

[2] Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: a survey. Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery.

16

17 of 34

Sentiment Analysis Based on Stacked Bi-LSTM

past

contexts

future

contexts

Combining

past & future

contexts

17

18 of 34

Stacked Bidirectional LSTM Model

: LSTM cell

info to be kept

info to be forgotten

18

19 of 34

Sentiment Analysis Based on Stacked Bi-LSTM

Sentiment Prediction

19

20 of 34

Outline

  • Motivation

  • The Proposed Method

  • Experiments

  • Conclusion

20

21 of 34

Experiment Setting

  • Data
    • 65,536 comments for training CBOW model
    • 3,000 labeled comments for training Stacked Bi-LSTM
      • 1,514 positive comments
      • 1,486 negative comments

  • Experimental Performance Evaluation
    • Descriptive statistics of testing data set

    • Evaluation metric

21

22 of 34

Results

  • The proposed method (CBOW + Stacked Bi-LSTM) outperforms any of the other methods

22

23 of 34

Influence of Different Factors

  • The performance of Stacked Bi-LSTM Model with different number of hidden cells

23

24 of 34

Influence of Different Factors

  • The performance of Stacked Bi-LSTM Model with different number of input size

24

25 of 34

Outline

  • Motivation

  • The Proposed Method

  • Experiments

  • Conclusion

25

26 of 34

Conclusion

  • Encode semantic properties of words using CBOW model

  • Abstract contextual features of documents using Stacked Bi-LSTM model

  • The effectiveness of CBOW+Stacked Bi-LSTM has been verified in Chinese microblogs' comments sentiment prediction

26

27 of 34

Thanks

27

28 of 34

29 of 34

Influence of Different Factors

  • The performance of Stacked Bi-LSTM Model with different number of layers

25

30 of 34

Corpus Construction

Preprocessing of raw microblog text:

Removing hashtags, reply symbols and several references to user names (@user) and links.

Chinese segmentation and stop word processing

31 of 34

Skip-gram

context words

current word

(Mikolov et al., 2013)

32 of 34

Continuous Bag-of-Words (CBOW)

context words

current word

(Mikolov et al., 2013)

33 of 34

Long Short Term Memory (LSTM) [1]

Memory Unit of LSTM [2]

[1] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

[2] Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: a survey. Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery.

16

old cell state

new cell state

output from the previous hidden layer

current input

output of current hidden layer

forget gate:

what information to dump from the cell state

input gate:

what new information to store in the cell state

new memory:

new candidate values after adding new input

new cell state:

update the old cell state Ct-1 into new cell state Ct

output gate:

decides which parts of the cell state to output

output of current hidden layer:

puts the cell state through the tanh function and multiplies it by the output of the sigmoid

gate

34 of 34

Stacked Bidirectional LSTM Model

: LSTM cell

info to be kept

info to be forgotten

18

Sigmoid