Research at the Service of Free Knowledge
Leila Zia, Head of Research
K-CAP 2021
2021-12-03
2001
An online encyclopedia that anyone can edit and access for free
0.5M volunteer editors
280+ languages
15B monthly
pageviews
10M monthly
edits
The largest encyclopedia
55M
articles
Wikipedia is an evolving radical model for the governance of knowledge.
Who operates Wikipedia?
Wikimedia Foundation
Wikimedia projects
Research
Research by Victoruler (CC BY 3.0, from the Noun Project)
Research priorities
DARIO TARABORELLI /CC0
Addressing knowledge gaps
Improving knowledge integrity
White Papers: https://meta.wikimedia.org/wiki/Research:2030
Verifiability
Transparency
Neutrality
Consensus
Privacy
Mission
A connected and open Web and internet
Freedom of speech and thought
Autonomy and ownership
Decentralization
Independence
Open data, science, and code
Multilinguality
Equity
infrastructure and compute resources
Addressing knowledge gaps
The research program:�Addressing Knowledge Gaps
Identify gaps
Bridge gaps
Measure gaps
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
English Wikipedia (950,277)
Native Speakers 527M
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
Russian Wikipedia (298,215)
Native Speakers 254M
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
Spanish Wikipedia (261,495)
Native Speakers 389M
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
Portuguese Wikipedia (185,133)
Native Speakers 193M
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
Arabic Wikipedia (87,017)
Native Speakers 467M
Content
In order to understand knowledge gaps we must understand not only the content gaps but also the readership and contributorship gaps.
Knowledge
is socially constructed.
Bruno Latour and Steve Woolgar. Laboratory life: The construction of scientific facts.
Johnson, Isaac, Florian Lemmerich, Diego Sáez-Trumper, Robert West, Markus Strohmaier, and Leila Zia. "Global gender differences in Wikipedia readership." AAAI ICWSM 2021
Wikipedia pageviews of readers by language and gender
The Knowledge Gap Index
Aiko Chou, Martin Gerlach, Fabian Kaelin, Isaac Johnson, Marc Miquel, Miriam Redi, Leila Zia
Knowledge Equity
[from Wikimedia 2030 strategy]
_Knowledge equity: As a social movement, we will focus_
_our efforts on the knowledge and communities that have_
_been left out by structures of power and privilege. We will_
_welcome people from every background to build strong and _
_diverse communities. We will break down the social, political,_
_and technical barriers preventing people from accessing and_
_contributing to free knowledge._
How far are we from reaching knowledge equity?
Operationalize knowledge equity
Identify and measure the individual components (knowledge gaps) based on which we can track our progress towards this goal
Goal: The Knowledge Gap Index
Example: EU’s Gender Equality Index
Image Credits: Marc Miquel Ribe
Example: Monitoring the Gender Gap on Wikipedia
1: Identify
Build a Taxonomy of Knowledge Gaps
2: Quantify
Develop Metrics to Quantify Knowledge Gaps
3: Expose
Surface gaps in the Knowledge Gap Index
0.5
We are HERE
Our Roadmap
Taxonomy of Knowledge gaps: how we built it
Knowledge is not only about content!
Readers
Contributors
Content
Readers
Contributors
Content
Knowledge gaps:
Disparities with respect to coverage of specific groups of readers, contributors or content across Wikimedia projects.
Taxonomy of Knowledge gaps: how we built it
Finding evidence of knowledge gaps from different sources
Academic Literature
Movement Strategy and Initiatives
Community Surveys
2: Quantify
Develop Metrics to Quantify Knowledge Gaps
1: Identify
Build a Taxonomy of Knowledge Gaps
Knowledge Gaps Metrics:
Data Categorization
Data
Metrics Generation
Questions about knowledge gaps
What is the most popular motivation for editing Farsi Wikipedia?
Which gender group has higher quality articles and images in Greek Wikipedia?
What is the geographical distribution of articles in kiswahili?
Stakeholder consultations
Cultural Background Gap:
What is the extent of local content coverage?
The language cultural context is defined as all the places, people, objects ...
that relate to the territories where the language is spoken
Miquel-Ribé, Marc, and David Laniado. "Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions." Proceedings of the International AAAI Conference on Web and Social Media. Vol. 13. 2019.
Content Classifier
Cultural context? y/n
Cultural context label
Mapping between languages and local territories
Articles labeled as local and non-local content
Data Categorization
Aggregation over all articles: proportion of local/nonlocal content by language
Metrics Generation
Data
Readability Gap:
What is the readability of content on Wikipedia?
Readability is the ease with which a reader can understand a written text
There exist automatic readability scores for English … but what about other languages?
Motivation Gap:
Why do people read Wikipedia?
Previous research developed survey questions in different languages to measure the motivation behind readership..
But how to measure this at scale, and for all languages?
Open questions
Open questions
Content
Readership
Open questions
Contributorship
General
Link recommendation
Martin Gerlach (Research Team)
Growth Team (Marshall Miller, Rita Ho, Kosta Harlan, many more)
Djellel Difallah (NYU Abu Dhabi)
Editing is hard
Problem
??
??
Does this article need an update? How do I start editing?
Is this saying a reference is needed? How do I add it?
Is this where the references are? How is it different from citations?
??
Technical
What is an infobox?
Conceptual
What is notability?
Cultural
Why are people
so mean?
Structured task editing
Solution
Link recommendation
Entity-linking task
Hypatia (born c. 350–370; died 415 AD) was a Hellenistic Neoplatonist philosopher, astronomer, and mathematician, who lived in Alexandria, Egypt, then part of the Eastern Roman Empire.
astronomer
Astronomy
Astronomer
--no link--
?
The Add-a-link Task in Wikipedia
Step 1: Mention detection
Step 2: Link generation
Step 3-a: Link disambiguation- features
Step 3-b: Link disambiguation- classifier
Evaluation
Held-out test set + Manual evaluation�(thanks: Bennoit Evellin, Habib Mhenni, Martin Urbanec, Bluetpp, -revi)�
Tested Wikis: Arabic, Bengali, Czech, English, French, Vietnamese�
Precision: 70% - 92%�How many suggestions are correct?
Recall: 30% - 66%�How many of the possible links captured?
Link recommendation model
User interface
Evaluate the suggestion
Feedback on algorithm
Edit summary
Next suggestion
In practice
Open questions
Content
Readership
Open questions
Contributorship
General
Scaling Research on
Free Knowledge
Growth by Fabio Rinaldi (CC BY 3.0, from the Noun Project)
DARIO TARABORELLI /CC0
A sustainable distributed network of Wikimedia projects relies on an empowered global network of Wikimedia researchers.
4:1,000,000,000
The Research Team
Martin Gerlach
Research Scientist
Isaac Johnson
Research Scientist
Emily Lescak
Senior Research Community Officer
Miriam Redi
Research Manager
Diego Sáez-Trumper
Senior Research�Scientist
Pablo Aragón
Research Scientist
Learn more: https://research.wikimedia.org
Leila Zia
Director, Head of Research
Fabian Kaelin
Senior Research�Engineer
Formal Collaborators
Past formal collaborators: https://www.mediawiki.org/wiki/Wikimedia_Research/Collaborators/Archive
The current initiatives and principles
To further expand and nurture the research community around the Wikimedia projects we:
Support the
Wikimedia projects
Help by shashank singh (CC BY 3.0, from the Noun Project)
wikiworkshop.org (expect updated information in a week)
The 9th edition will take place as part of TheWebConf 2022.
USD 2K-50K
or follow us on Twitter: @WikiResearch
leila@wikimedia.org
http://research.wikimedia.org