1 of 58

B. Tech. Project

TOWARDS TARGET-AWARE TWITTER STANCE DETECTION

17CS10006: Ayush Kaushal

Prof. Niloy Ganguly (Supervisor)

2 of 58

Introduction

NAACL-HLT 2021

Introduction

Spurious Cues

Target-Aware Stance

Conclusion

The Part of work done in the Thesis is going to appear in the Main track of NAACL-HLT, 2021

tWT–WT: A Dataset to Assert the Role of Target Entities for Detecting Stance of Tweets

- Ayush Kaushal, Avirup Saha and Niloy Ganguly

Preprint

  • Target oblivious classifiers can deliver impressive performance.
  • Datasets have sentiment-stance and lexicons spurious ues.
  • Proposed target aware tWT–WT (targeted WT–WT) dataset.

Abstract

3 of 58

Introduction

Stance Detection

Introduction

Spurious Cues

Target-Aware Stance

Conclusion

* Text portion of the Tweet example taken from SemEval 2016 task 6 dataset

4 of 58

Introduction

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

Applications of Stance Detection

Analysing Debates

Sentiment Analysis

Detecting Fake News

Verifying Rumours

5 of 58

Introduction

Stance Detection Systems

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

6 of 58

Introduction

Spurious Cues in Datasets

Introduction

Spurious Cues

Conclusion

Example:

Visual Question Answering[1]

Q. What is the colour of sky?

Ans. Blue

Cue: Generic truth

Q. Does the man have legs in the air?

Ans. Yes

Cue: Nature of questions annotators ask.

[1] Y. Goyal et. al. 2017. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. CVPR 2017

Target-Aware Stance

7 of 58

Introduction

Spurious Cues in Datasets

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

8 of 58

Introduction

Role of Targets in Detecting Stance

Introduction

Spurious Cues

Conclusion

* The text portion of the annotated example is taken from WT-WT dataset.

Target-Aware Stance

9 of 58

Introduction

Targets as free-form sentences

Introduction

Spurious Cues

Conclusion

* Text portions of Tweet and Target are taken from RumourEval 2017 dataset

Target-Aware Stance

10 of 58

Introduction

Variants of Twitter Stance Detection

  • We considered at least one dataset of each type

Introduction

Spurious Cues

Conclusion

Stance Detection

  • Targets are Fixed Entities
  • Test and train on same targets

Multi-target

  • Targets are a pair of Fixed Entities
  • Test and train on same targets

Cross-target

  • Targets are Fixed Entities
  • Test and train on different targets

Rumour Stance

  • Targets are free-form rumour claims
  • Test and train on different claims

Target-Aware Stance

11 of 58

Introduction

Demonstrating Spurious Cues in Twitter Stance Detection Datasets.

Creating new datasets benchmarks for Target Aware Stance Detection

Investigating Datasets for the spurious cues.

Re-evaluating for Target Aware Stance Detection

Contributions

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

12 of 58

Spurious Cues in Twitter Stance Detection Datasets.

13 of 58

Spurious Cues in Datasets

Overview:

  • Demonstrating Spurious Cues:
    • Impressive performance of target oblivious models.
    • Small performance gap between target oblivious and target aware models.�
  • Nature of dataset biases:
    • Sentiment Correlations
    • Lexical correlations
    • Others: Tweet-length, Opinion

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

14 of 58

Spurious Cues in Datasets

Datasets Considered - 3/6

Will-They-Won’t-

They (WT-WT)

01

> Cross Target

> Financial Domain (M&A)

> 50k+ Tweet-target pairs

SemEval 2016

Task-6

02

> Vanilla Stance Detection

> Various Domains - politics, movements, policy

> 4.1k Tweet-target pairs

M-T Multitarget

03

> Multi-target Stance

> Political domain

> 4.4k Tweet-target pairs

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

15 of 58

Spurious Cues in Datasets

Datasets Considered - 6/6

RumourEval 2017

04

> Rumour Stance Detection

> Disaster Domain Threads

> 5.5k Tweet-target pairs

RumourEval 2019

05

> Rumour Stance Detection

> Disaster Domain Threads

> Twitter + Reddit

> 8.5k Tweet-target pairs

Encryption Debate

06

> Vanilla Stance Detection

> Encryption Debate

> 3k Tweet-target pairs

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

16 of 58

Spurious Cues in Datasets

Very few examples of tweets with different targets.

Dataset

% of tweets with different targets

WT-WT

2%

SemEval16

0%

Rumour2017

0%

Rumour2019

0%

Multi-target

0.9%

Encryption

0%

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

17 of 58

Spurious Cues in Datasets

Obtaining Dataset

Some of the datasets release only the tweet ids:

> Scrapped using Twitter API* and Tweepy**

Dataset

Tweets scrapped

WT-WT

45865 / 50210

Multi-target

2688 / 4413

Encryption

1634 / 2522

* developer.twitter.com

** tweepy.org

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

18 of 58

Spurious Cues in Datasets

  • Word De-contraction: Example - Don’t -> Do not

  • Removing URL, Emoji (😃😄😆😍), Punctuations

  • Word Segmenting
    • #ClimateChange -> [‘#’, ‘Climate. ‘Change’]

  • Text normalizing - Lowercasing and Username normalization
    • @BarackObama -> @USER

  • Trimmed sentences to 99 tokens

Preprocessing

* Libraries used - ekphrasis, nltk

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

19 of 58

Spurious Cues in Datasets

Setting up the Experiments

> Each datapoint is a tuple: (Tweet, Target, Stance)

> Target oblivious Model classify only on the tweet.

> Target aware Model receives both as input.

> Target Aware Models should outperform Target Oblivious significantly.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

Images shown in this slide have been taken from Shutterstock.

20 of 58

Spurious Cues in Datasets

Target Aware

Bert Model

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

This picture of Bert is taken from Sesame Street show after which Bert has been named.

21 of 58

Spurious Cues in Datasets

Target Oblivious

Bert Model

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

This picture of Bert is taken from Sesame Street show after which Bert has been named.

22 of 58

Spurious Cues in Datasets

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

Domain-Specificity: Twitter

23 of 58

Spurious Cues in Datasets

Evaluation Metrics

01

Accuracy

Accuracy

Fraction of labels correctly predicted

02

Tile Error

Weighted Average F1

> Weighted average with weights proportional to the number of examples in that class.

03

Macro

F1

04

Human Bounds

F1 Weighted

Macro Averaged F1

> F1 score is the harmonic mean of precision and Recall

> Macro F1 is a simple average of F1 across all the classes

Human upper bound

Used for comparison purposes only.

Provided for some datasets.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

24 of 58

Spurious Cues in Datasets

Results (part 1): WT-WT Dataset

Observations:

> Target oblivious Bert performs near or above human bounds.

> Little performance gains from considering targets.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

25 of 58

Spurious Cues in Datasets

Results (part 2): WT-WT Dataset

Similar Observations:

> Target oblivious Bert performs near human bounds.

> Out-of-Domain (OOD): Massive performance drop.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

26 of 58

Spurious Cues in Datasets

Results (part 3): SE16 and M-T Datasets

> Target oblivious Bert consistently gives > ⅔ accuracy.

> Performs well on all metrics, very close to target aware.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

27 of 58

Spurious Cues in Datasets

Results (part 4)

Skewed class distributions:

  • Desired metric: Macro-F1

Target Oblivious:

> Above ⅔ accuracy score

> Impressive Macro-F1

> Performs near target aware

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

28 of 58

Spurious Cues in Datasets

Visualizing the Results

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

This plot was drawn using Matplotlib and Seaborn Libraries.

29 of 58

Spurious Cues in Datasets

Dataset Analysis:

  • Picked WT-WT dataset
    • Most recent
    • Largest dataset
  • Dataset details:
    • Targets are 5 Merger and Acquisition.
    • Stance class
      • Support - Tweet supports that the merger will happen
      • Refute - Tweet refutes that the merger will happen
      • Comment - Tweet neither supports nor refutes.
      • Unrelated - Tweet does not talk about the merger.
  • Analysis:
    • Lexicon-choice associated with stance
    • Sentiment-stance correlations

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

The Image shown in this slide is taken from VanillaLaw

30 of 58

Spurious Cues in Datasets

Dataset Analysis: Lexical Choice

  • Pointwise mutual information[1]

  • Practical considerations -
    • Stop-word removal.
    • Emphasis on highly discriminative word-class correlations:
      • Apply add 100-smoothing.

[1] Gururangan et. al. Annotation artifacts in natural language inference data. NAACL 2018

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

31 of 58

Spurious Cues in Datasets

Dataset Analysis: Lexical Choice

Top 5 stance-wise lexicons according to PMI, along with percent of tweets with stance class containing the word

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

Support

Refute

Comment

Unrelated

approves

3.3%

urges

3.0%

ceo

3.7%

stocks

3.4%

approve

5.1%

blocked

5.5%

healthcare

11.8%

size

2.6%

billion

26.2%

sues

4.3%

mean

2.3%

merge

11.3%

shareholder

0.7%

blocks

4.8%

merger

29.3%

bid

19.0%

close

6.4%

block

21.8%

trial

3.4%

agreement

16.7%

32 of 58

Spurious Cues in Datasets

Dataset Analysis: Sentiment and Stance

  • Used XLNet sentiment classifier:
    • Sentiment Range: [0, 1]
    • 0 -> most negative
    • 1 -> most positive

  • Observations:
    • Support, Refute -> Extremas
    • Comment, Unrelated -> Neutral

Class

Sentiment

Support

0.23

Refute

0.64

Comment

0.49

Unrelated

0.48

[1] Yang et. al. Xlnet: Generalized autoregressive pretraining for language understanding.NeurIPS 2019

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

33 of 58

Spurious Cues in Datasets

Dataset Analysis

  • Sentiment and lexicons can be spurious cues.

  • Various cues in other datasets:
    • RumourEval 2019
      • ‘?’ in 75% ‘query’ stance tweets, 11% of remaining have it.
      • 0.75 ‘Deny’ stance Tweets have < 0.1 sentiment score.
    • SemEval 2016
      • 91.4% of tweets without opinion have ‘None’ stance.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

34 of 58

Spurious Cues in Datasets

Dataset Analysis: Length Correlation

Very less correlation compared to previous works.[1]

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

[1] Gururangan et. al. Annotation artifacts in natural language inference data. NAACL 2018

35 of 58

Spurious Cues in Datasets

Dataset Analysis: Length Correlation

  • Tweets with unrelated stance are somewhat longer.
  • Two peaks due to tweet length constraints.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

36 of 58

Spurious Cues in Datasets

Dataset Analysis: Length Correlation

  • Query stance has a peculiar distribution over the tweet length.

Introduction

Spurious Cues

Conclusion

Target-Aware Stance

37 of 58

Towards Target Aware Twitter Stance Detection

38 of 58

Target Aware Stance Detection

Overview

  • Motivations from previous section:
    • Presence of spurious cues - sentiment and lexicons
    • These cues aid target oblivious models

  • Overview of this section:
    • Dataset creation process
    • Re-evaluating stance detection systems

Introduction

Spurious Cues

Target Aware Stance

Conclusion

39 of 58

Target Aware Stance Detection

Dataset Creation Method

  • Augment WT-WT dataset:
    • Largest and most recent
    • Annotated by experts

  • Reasoning:
    • Aim to handle the sentiment and lexicon correlations
    • Target unaware models will fail if stance varies with targets.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

40 of 58

Target Aware Stance Detection

Augmenting procedure - part 1

  • Remove the sentiment-stance correlation
    • Create negated targets for refute and support stance class.
      • <buyer> buys <target> → <buyer> not buys <target>

Results: Near same sentiment score for each class.

Class

Sentiment

Support

0.44

Refute

0.44

Comment

0.49

Unrelated

0.48

Introduction

Spurious Cues

Target Aware Stance

Conclusion

41 of 58

Target Aware Stance Detection

Augmenting procedure - part 2 and 3

  • Address the lexicon-stance associations
    • For each tweet with only one labelled target, label one other target randomly with ‘unrelated’ stance.
  • Balance class distributions:
    • Create negated targets for 50% ‘comment’ & ‘unrelated’ stance

II III

Introduction

Spurious Cues

Target Aware Stance

Conclusion

42 of 58

Target Aware Stance Detection

Targeted WT–WT Dataset Statistics

  • 111596 tweet-target pairs
  • At least 10000 data points for each target-merger
  • Balanced ratio for support, refute, comment, unrelated- 1:1:3:5

  • Similarly augment the SemEval 2016 and Multi-target datasets.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

43 of 58

Target Aware Stance Detection

Maximum Accuracy of Target Oblivious Classifiers

Theorem: The maximum possible accuracy for any deterministic target-oblivious class stance classifier is:

where;

  • T = {t1, t2, . . . tn} -> set of tweets; S = {s1, s2, . . . sm} -> set of stances.
  • count(ti) -> number of targets labelled for ti
  • p(yi | ti) -> fraction of the targets with stance yi for tweet ti

Introduction

Spurious Cues

Target Aware Stance

Conclusion

Targeted WT-WT

0.722

Targeted SE16

0.551

Targeted M-T

0.506

44 of 58

Target Aware Stance Detection

Experiments with Targeted datasets

Baselines:

  • Target Oblivious Bert: Same as Before
  • Target Aware Bert: Same as Before
  • SiamNet: Siamese networks with Bert
  • TAN: Target-specific Attention Networks with Bert

Metrics:

  • Same as the non-targeted counterparts.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

45 of 58

Target Aware Stance Detection

SiamNet + Bert

Introduction

Spurious Cues

Target Aware Stance

Conclusion

46 of 58

Target Aware Stance Detection

TAN + Bert

Introduction

Spurious Cues

Target Aware Stance

Conclusion

47 of 58

Target Aware Stance Detection

Experiments with Targeted WT–WT (Part-1)

Observations:

  • Target Oblivious Bert performs very poorly.
  • Target Aware Bert performs the best with a lot of scope for improvement.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

48 of 58

Target Aware Stance Detection

Experiments with Targeted WT–WT (Part-2)

Observations:

  • Target Oblivious Bert performs very poorly.
  • Target Aware Bert performs the best with a lot of scope for improvement.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

49 of 58

Target Aware Stance Detection

Experiments (Part-3)

Observations:

> Target Oblivious Bert

performs poorly.

> Target Aware Bert performs

the best.

> SiamNet comes very close

to Target Aware Bert

> TAN performs very poorly.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

50 of 58

Target Aware Stance Detection

Experiments Overview

Introduction

Spurious Cues

Target Aware Stance

Conclusion

51 of 58

Conclusions and Future Work

52 of 58

Conclusions & Future Work

Conclusions:

  • Empirically demonstrated spurious cues in twitter stance detection dataset that inflates performance of models.

  • Investigated the datasets for spurious cues, to find sentiment-stance and lexicon-stance correlation. Useful for future dataset creation.�
  • Proposed an augmentation method for removal spurious cues, creating the largest stance detection dataset.

  • Re-evaluated systems to show usefulness of the new datasets. Room for future work on stance detection systems.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

53 of 58

Conclusions & Future Work

Future Work:

  • Explainable Stance Detection Systems.

  • Analysis of Multi-lingual stance datasets.

  • Target-aware Stance Detection Systems, reasoning about the target entities.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

54 of 58

Conclusions & Future Work

Future Work: Visualization

Target Aware trained

on Targeted-WTWT

Introduction

Spurious Cues

Target Aware Stance

Conclusion

55 of 58

Conclusions & Future Work

Future Work: Visualization

Target Aware Bert trained on WTWT

Introduction

Spurious Cues

Target Aware Stance

Conclusion

56 of 58

Conclusions & Future Work

Code and Trained models

  • Links to Trained model on respective repositories.

  • Detailed Readme and Environment Configurations.

Introduction

Spurious Cues

Target Aware Stance

Conclusion

The pictures for Octocat, Pytorch logo and Huggingface logo are taken from their respective GitHub organizations.

57 of 58

Conclusions & Future Work

Leaderboard and Dataset

wtwtv2-dataset.github.io/

Introduction

Spurious Cues

Target Aware Stance

Conclusion

The leaderboard website is inspired by Squad, HotPotQA and HoVer dataset leaderboards.

58 of 58

Thank you

Slide Template partial credit: SlidesCarnival