1 of 16

Project facilitator

Simone Persico

Facilitators

Annick Vignes

Christophe Prieur

DMI Winter School 2022

Crawling (into) the

Italian green pass

debate on Twitter

The power of disinformation

2 of 16

Topic and relevance

  • The green pass has been a hot topic in Italy (as well as in most western countries) over the past months.

  • A recent Censis annual report highlights an increasingly irrational Italian society.

tinyurl.com/censis-societa-irrazionale

  • Censis also states that 4.5 million Italians only use social media as a source of information.

tinyurl.com/censis-informazione-sui-social

Meme translation:

<Anti-vaxxer outside the restaurant> <Me, having a Green Pass>

3 of 16

Dataset

4.300.000+ tweets posted by Italian users, containing the keywords ‘greenpass’, ‘green pass’ and ‘supergreenpass’ during a period going from June 15th to December 14th 2021.

tweets

tweets that include mentions

tweets that Include links

4 of 16

Focus

The focus of the analysis is on the URLs shared by users. Especially taking into account information sources.

tweets that include links

5 of 16

Research question

The green pass as a case study: do the mainstream media resist to complotist websites?

The purpose is to use the green pass debate in Italy as a case study to investigate the network of URLs shared on tweets. The goal is to find out more about information sources and to measure a possible contamination of mainstream media by complotist sites over time.

6 of 16

Focus

A selection of 4 key periods were especially taken into consideration.

July 20-27

Green pass announced

June 15-29

Before the green pass

Sep 28-Oct

Green pass for workers

Nov 16-30

Super green pass

7 of 16

Methodology

  1. Data was collected using 4CAT and then aggregated into TCAT.

  • We extracted a list of the most relevant URLs, crawling them with HYPHE to map a larger network.

  • The source domains were categorized by proximity into Mainstream and Not-mainstream.

A list of starting sources has been labelled as:

  • The network of URLs was studied over time with two approaches:

Micro approach: Ego Network on byoblu

Macro approach: Network analysis through distance score and evolution of network structure.

Christian democracy;

Conservatism;

Right wing populism;

Liberalism;

Populism;

Center;

Left-wing populism;

Social liberation;

Socialism;

Mainstream

(political membership)

Not-mainstream

(Using reliable Italian debunking sites as source)

Sites of hoaxes and medical and scientific information;

Political disinformation sites;

Religious and/or racial disinformation sites;

Political and racial hoax sites;

Clickbait sites;

Clickbait but social pages;

The sites and social pages of Conspiracy theoristes

8 of 16

Micro approach: Byoblu

Byoblu Ego Networks

  • Focus on peaks: first in July and the last in November 2021

  • Focus on troughs: first half of July and the first week of October

  • Network filtered around the byoblu twitter profile’s egonetwork (depth of 3) on Gephi

9 of 16

Micro approach: byoblu

From 1 to 12 July, trough before the green pass introduction

From 20 to 27 July, first peak due to the green pass introduction

10 of 16

Micro approach: byoblu

From 29 September to 06 October, trough before the green pass was made compulsory for working

From 17 to 29 November, peak before the super green pass introduction

11 of 16

Macro approach: network analysis

Dominant voices week by week

Here we can observe the evolution of the dominant websites among twitter corpus, namely the domains that have been the most shared. This type of graph allows us to monitor the evolution of how and when a website can be prevalent in the debate.

12 of 16

  • Network : top 400 most shared URLs in dataset + ~15k crawled URLs (Hyphe)
  • Proximity score based on distance to mainstream/non-mainstream websites
  • Mainstream and non-mainstream websites loosely follow # tweets /week
  • However, proportion of mainstream/non-mainstream is constant

Macro approach: distance score analysis

Distance score analysis

Number of websites at distance 1 of mainstream/non-mainstream media per week

→ Suggests that mainstream media are “resisting” well to non-mainstream media activity

13 of 16

Macro approach: hyperlinks network analysis

The goal was to make appear a list of unique domains (urls already on our dataset plus urls crawled with Hyphe that were on our dataset) cited per week and make them move on Gephi.

Hyperlinks network analysis: a methodological issue

July

September

November

We observe a huge decrease in the number of urls present in the dataset over time. As scientists, especially with so many technical steps in the process, we had to take a step back. A plausible explanation is that there is a problem with having filtered, at the beginning, the domains cited by at least 10 tweets. The later a site appears in the timeline, the less likely it is to be cited by at least 10 tweets across the timeline and appear in the visualization.

14 of 16

Results and conclusions

  • Conclusion 1: During the peak, the weight of byoblu is decreasing.

  • Conclusion 2: The proportion between Mainstream and Not mainstream media stable on the studied period (optimistic findings!)

Possible future developments on the research:

Number of bridges to other social media

  • Test the generalizability of these results by comparing them with Twittern patterns on other topics

  • Focus on the number of bridges to other social media and their possible implications/effects

15 of 16

Main takeaway

We are not crawling into the dark

16 of 16

Thank you

#vaccinate