1 of 29

How to disagree well?�

Presenter: YW

Annotator: HK

Author: CK, TS, AV

Paper: https://aclanthology.org/2022.emnlp-main.252/

2 of 29

Classical Theories of Argumentation: Aristotelian

  • Goal: Persuade the other party to take your point of view.

  • Ethos – persuasion through the author's character or credibility.

  • Logos – persuasion through logic.

  • Pathos – persuasion through emotion or disposition.

  • Kairos – an appeal made through the adept use of time.

3 of 29

Classical Theories of Argumentation: Rogerian

  • Goal: reach common ground between 2 opposing parties.

  • Focused on compromise.

  • Thorough examinations of pros and cons to propose a resolution.

4 of 29

Disagreement = Arguments + Attack

5 of 29

Graham’s�Hierarchy

6 of 29

Direct insult

7 of 29

I know better than you; you do not have a physics degree

8 of 29

Off-topic: I currently have 6 Gmail invites and I want to give 3 away to Wikipedia members” (Talk page for the article on Gmail).

9 of 29

Policing: “You’ve said that before”, “calm down”, correcting spelling errors

10 of 29

Stating stance

11 of 29

Repeated argument: no more argument nor information, simple repeating

12 of 29

Contradiction with new reasoning

13 of 29

Directly responding to part of the argument and explaining the stance.

14 of 29

Truly refuting something.

15 of 29

Author’s Hierarchy�Rebuttal Tactics

Repeated Argument.

Off-topic.

Ethos

Pathos

Logos

Logos

Logos

16 of 29

Author’s�Coordination�Tactics

Rogerian

17 of 29

Data Source

  • Wikipedia Talk Pages:  administration pages where editors can discuss improvements to articles or other Wikipedia pages.

  • Wiki disputes dataset: Talk pages tagged as “disputes” by editors + “escalation” labels. 213 disputes (conversations), 3865 utterances.

  • Can be biased towards a common goal. (naturally inclined to Rogerian arguments)

  • Wikipedia Guideline: https://en.wikipedia.org/wiki/Wikipedia:Dispute_resolution

18 of 29

Data �Labeling

19 of 29

Average rebuttal score doesn’t help resolve disagreement.

  • Mean rebuttal score has a weak negative correlation with the “escalation” label. (-0.19, -0.24)
    • Is this significant enough?

  • Why: taking average ignores temporal order.
    • High-level -> name calling is the same as the inverse.

20 of 29

The context of Personal Attack (PA)

  • PA: name calling (65 cases) + ad hominem (575, third most popular)

  • Most common: ad hominem + counter argument (119 occurrence)
    • You aren’t an expert in X Area. You didn’t consider Y effect on it.
    • Even more than ad hominem alone.
    • Also highly correlated: ad hominem + repeating argument

  • Not likely to see: PA + coordination labels
    • You know nothing about X area. I don’t know either.
    • Excluding “bailing out”: You know nothing, and I don’t want to discuss with you.

21 of 29

The effect of PA

  • PA leads to escalation. (60.7% of PAs go to escalation.)

  • There is a recovering stage.
    • Recover definition: Having utterance labeled as Level 5 or above with no further PA.
    • Half of the disputes were found to recover after PA. Wikipedia user might be more adept at moving beyond.
    • Escalated: 44.3% are found to be recovered after PA.
    • Resolved: 59.2% are found to be recovered after PA.

22 of 29

Users are infected by the “environment”

  • Mirroring effect: a social phenomenon where speakers reflect the behavior of others in a conversation.

  • 57% positive for mirroring effect.

23 of 29

Predict dispute tactics

  • Binary relevance classification: Each label has its own binary predictor.

  • Label powerset: A powerset is one of the potential combination of the labels. E.g., Class 1 = {DH1} , Class 2 = {DH1, DH2} , …
    • 2^L combinations. Authors pick the top 20 labels.

  • Deep-learning BR: directly predict each label.

  • Network structure: Bag of words+ MLP, LSTM + attention layer (HAN), BERT

24 of 29

Result

25 of 29

Results, cont’d

  • Truncated LP performs best: there are strong correlation between certain classes.

  • If we calculate the proportion of the test set with at least one label correctly predicted, best model 39.5% accuracy.
    • However, my trivial algorithm can get 35.5% accuracy.

  • Refutation and refuting the central point are NEVER correctly predicted (out of 44 cases), with counterargument often mistakenly predicted instead.

26 of 29

Predict ordinality

27 of 29

What I like about the paper.

  • This is hard problem to work on: lacking data, no clear way of annotation, etc.
  • Innovation refinement in hierarchical rebuttal level and coordination.
  • A rich selection of performance metrics.
  • Good discovery on the correlation study.

28 of 29

What I don’t like about the paper.

  • Writing style is very “sparse” at best.
    • Lots of background and related work paragraphs spread all over the paper.
    • Innovation points are also spread all over the paper. The most extreme case is that the authors introduce a technique even after the final results have been presented.
    • Insufficient explanation of the formulas.

29 of 29

What I don’t like about the paper.

  • Fail to answer the initial research question. How to rebuttal well?
    • One of the ML model training goal is to predict if the rebuttal level will increase. But this seems misaligned.
    • The reported average rebuttal scores are WEAK negatively correlated to the escalation.

  • The predicting network doesn’t seem to work well.