1 of 21

Crowdsourced Fact-Checking at Twitter:

How Does the Crowd Compare With Experts?

Mohammed Saeed

Maelle Nicolas

Paolo Papotti

Nicolas Traub

Gianluca Demartini

1

SIGIR Student Travel Grant

2 of 21

2

Expert

Fact-Checker

Claim is False!

3 of 21

3

Expert

Fact-Checker

Machine-based

Algorithm

Crowd of

Non-Experts

BirdWatch

4 of 21

Research Questions

  • How are check-worthy claims selected by Birdwatch users?

  • Can the crowd identify check-worthy claims before experts do?
  • Are crowd workers able to reliably assess the veracity of a tweet?

  • What sources of information are used to support a fact-checking decision in Birdwatch and how reliable are they?

4

Claim

Selection

Fact-checking pipeline

Evidence

Retrieval

Claim

Verification

Claim

Selection

Evidence

Retrieval

5 of 21

BirdWatch Tour

5

Tweet

Noter #1

Noter #2

Noter #3

Rater #1

Rater #2

A user writes a tweet.

BirdWatch users provide a note on the tweet.

Other BirdWatch users rate the notes done.

Tweet

A final verdict is associated with the tweet’s truthfulness.

6 of 21

Example (1/2)

6

Note #1

Potentially Misleading Dec 17

According to numerous independent sources, Trump lost the election. Politifact, 1/6/21: https://www.politifact.com/factchecks/2021/jan/07/donald-trump/trump-clings-fantasy-landslide-victory-egging-supp/ ”All 50 states and the District of Columbia have certified their election results, which Congress sought to finalize Jan. 6? There is no evidence that voter fraud affected that outcome.”

Given current evidence, I believe this tweet is:

NOT_MISLEADING

MISINFORMED_OR_POTENTIALLY_MISLEADING

I believe this tweet contains a digitally altered

photo or video.

Did you link to sources you believe most people

would consider trustworthy?

7 of 21

Example (2/2)

7

Rating #1 Dec 17

Do you agree with this note’s conclusion?

Is this note helpful?

HELPFUL

SOMEWHAT_HELPFUL

NOT_HELPFUL

Does this note cite high-quality sources?

Does the note directly address the tweet’s claim?

Is the note hard to understand?

Does the note contain spam, harassment, or abuse?

Does this note miss key points?

Fact-Check Nov 07

Claim: Donald Trump won the 2020 election, by a lot.

Verdict: Not Credible

Fact Checker: Lead Stories

Country: United States

Link:https://leadstories.com/hoax-alert/2020/11/fact-check-donald-j-trump-on-twitter-I-won-by-a-lot.html

8 of 21

Datasets

  • Two datasets:
    • BirdWatch Data containing notes and ratings for ~ 12K tweets
    • ClaimReview Data containing fact-checks done on ~ 77K claims
  • The tweets and fact-checks are matched using mTurk
    • We obtain 2208 tweets matched with ClaimReview fact-checks

8

9 of 21

How are check-worthy claims selected by Birdwatch users?

9

Claim

Selection

Evidence

Retrieval

Claim

Verification

  • Topic-wise analysis as a proxy for claim check-worthiness (BERTopic)

10 of 21

Can the crowd identify check-worthy claims before experts do?

10

Claim

Selection

Evidence

Retrieval

Claim

Verification

  • We analyze tweets , Birdwatch notes, and ClaimReview fact-checks time-wise.

  • Majority of cases were users spreading false news after they have been fact-checked

  • On average, a Birdwatch provides a response 10X faster than an expert for 129/2208 tweets

11 of 21

What sources of information are used to support a fact-checking decision in Birdwatch and how reliable are they?

11

Claim

Selection

Evidence

Retrieval

Claim

Verification

BirdWatch

ClaimReview

# Domain Names

2014

73

Examples

FoxNews, Breitbart

PolitiFact, CDC

  • To assess the quality of web sources, we rely on an external tool (NewsGuard)

12 of 21

Are crowd workers able to reliably assess the veracity of a tweet? (1/3)

12

Claim

Selection

Evidence

Retrieval

Claim

Verification

  • External Agreement:
    • Majority of ClaimReview labels match the Birdwatch ones.

  • External Agreement:

    • Reasons for mismatches (next slides)

13 of 21

Are crowd workers able to reliably assess the veracity of a tweet? (2/3)

13

Claim

Selection

Evidence

Retrieval

Claim

Verification

14 of 21

Are crowd workers and computational methods able to reliably assess the veracity of a tweet? (3/3)

14

Claim

Selection

Evidence

Retrieval

Claim

Verification

Method

Matched Claims

ClaimBuster

118/2208

E-BART

369/2208

BirdWatch

1492/2208

15 of 21

Key Takeaways

  1. Correlation in claim selection decisions

  • Crowd is effective in identifying tweets with pre-debunked misleading claims

  • Small set of high-quality sources for experts, unlike Birdwatch participants

  • Birdwatch users show high enough levels of agreement to reach decisions in the vast majority of cases.

15

16 of 21

Thanks!

Any questions ?

You can find me at

  • @MhmdSaeedms
  • saeedm@eurecom.fr

Some BirdWatch Notes

  • Pineapple does not belong on pizza.
  • I love sushi
  • Hello bird!
  • Love this program!!

16

17 of 21

Back-up

17

18 of 21

How are check-worthy claims selected by Birdwatch users?

18

Claim

Selection

Evidence

Retrieval

Claim

Verification

  • ClaimBuster API provides a score for claim check worthiness (between 0 and 1)

  • We run API on BirdWatch tweets and ClaimReview fact-checks

  • Low median score of 0.4

19 of 21

What sources of information are used to support a fact-checking decision in Birdwatch and how reliable are they? (3/3)

19

Claim

Selection

Evidence

Retrieval

Claim

Verification

BirdWatch

ClaimReview

Median

1.0

1.0

Minimum Value

0.495

0.875

20 of 21

Are crowd workers able to reliably assess the veracity of a tweet? (1/4)

20

Claim

Selection

Evidence

Retrieval

Claim

Verification

  • Internal Agreement:

    • Standard metrics fail due to the large sparsity in the data and the huge number of missing value.

    • We use compute the variance as a metric for agreement.

21 of 21

Is their assessment always considered helpful by others?

21

Claim

Selection

Evidence

Retrieval

Claim

Verification

  • Note Helpfulness Score
    • A note helpfulness score computed for each note

    • 533/2208 pass the threshold, with 333 notes labeling the tweets according to ClaimReview checks.

    • About 95% of notes label the tweets as misleading, thus indicating that Birdwatch users tend to rate misleading tweets more than non-misleading ones, in agreement with previous work.