1 of 63

Structured Data on Wikimedia Commons

and knowledge equity

Wikimania 2018, Cape Town

2 of 63

Etherpad

3 of 63

What is happening?

When?

Upcoming work

Benefits

How can this advance knowledge equity?

Leo za1 CC-BY-SA 3.0

4 of 63

Structured Data on Commons

2017-2019

adding metadata on Commons

in a structured & machine-readable format

making Commons files easier to

view, search, edit, organize and re-use,

in many languages

5 of 63

What are we doing?

6 of 63

Describing media files with Wikidata...

Jason.nlw, CC0, based on a lithograph (ca. 1840) by W. Crane, Public Domain

7 of 63

What we're working on

We worked / are working on

  • Multi-content revisions
  • Designs and prototypes for
    • file pages
    • UploadWizard
    • search
  • Prepare for GLAM pilot projects

~ October 2018: �multilingual captions for files

~ Early 2019:

  • 'depicts' and other structured data

8 of 63

Macro photo of a cat cleaning itself, by Jennifer Leigh, CC BY-SA 2.0, via Flickr

9 of 63

10 of 63

11 of 63

12 of 63

13 of 63

Upcoming,

now and later in 2018-19

Jesse Owens at start of record breaking 200 meter race during the Olympic games 1936 in Berlin (photographic montage). Unknown author, 1936, Public Domain, from the United States Library of Congress.

14 of 63

Properties for structured data on Commons

Community consultation runs NOW. Participate!

→ The basic properties that will be available for media files

15 of 63

Structured copyright and licenses

→ Needed by services like search.creativecommons.org

→ Powering better attribution and re-use

→ Alignment with other structured copyright schemas, like rightsstatements.org

Number plates from around the world, from Thomas's pics, Flickr, CC BY-SA 2.0

16 of 63

GLAM pilot projects

→ Diverse and representative

→ Support and documentation

Fisherman on the Volta River, 2017, own work, Alimaihli, CC BY-SA 4.0 (for Wiki Loves Africa)

17 of 63

Support for tool developers

18 of 63

19 of 63

Benefits for �Wikimedia Commons?

(With some examples of what people are currently already doing with Wikidata…)

20 of 63

Make Commons multilingual...

… so that it can be searched in other languages than English

Marten van Valckenborgh: The Tower of Babel (1595), Gemäldegalerie Alte Meister, Public Domain

Henri Meunier: Au theâtre, before 1922, Public Domain

21 of 63

'tableware' in Japanese - テーブルウェア

22 of 63

23 of 63

Machine-readable

A gilling machine, from Popular Science Monthly Volume 39, 1891. Unknown author, Public Domain

24 of 63

By becoming compatible with structured data used by other websites, �we can be part of more discovery platforms

25 of 63

Structured metadata and APIs to improve re-use

Reduce, Re-Use, Recycle, by Marcus Quigmire, CC BY-SA 2.0

26 of 63

More value and potential of Commons' media files beyond Wikipedia

27 of 63

… where Wikidata enables multilingual search and browsing

28 of 63

Linked data - Wikimedia's data connected with other datasets around the world

Linked Open Cloud diagram as of 30 May 2018, Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak / contributors to lod-cloud.net, CC BY-SA 3.0

29 of 63

30 of 63

Wikidata's WikiProject Sum of all Paintings…

All the museum collections which have paintings in them with the count per location

http://tinyurl.com/zqjz6q2

31 of 63

Querying & bringing together (thematic) information that may not have been combined before (across & also outside heritage collections!)

32 of 63

How can this advance knowledge equity?

(With some examples of what people are currently already doing with Wikidata…)

33 of 63

More potential for community improvements to media files

… even if they're minimally described when uploaded

Fatalulu on the East coast of Tutala (Timor). Siboga Expedition 1899-1900. Public Domain, Special Collections of the University of Amsterdam

34 of 63

Better crowdsourcing

with data that can �be given back to �an institution in a format that works for them

35 of 63

Multilinguality … also for small languages!

Jason.nlw, CC0, based on a lithograph (ca. 1840) by W. Crane, Public Domain

36 of 63

What is depicted in images in the collections of the National Library of Wales? Wikidata query in Welsh, French, Korean and Arabic

37 of 63

Equity in vocabularies

Actively add more diverse terms, people, places… to Wikimedia projects

38 of 63

Let's imagine �and work together...

Folklore dance from Tunisia, Mohamed Kamal, 2016, CC BY-SA 4.0 from Wiki Loves Africa

39 of 63

Annual photographic celebration of Africa’s cultures and diversity

40 of 63

65% of people are visual learners

Africa is the least covered region on Wikimedia projects

41 of 63

Wiki Loves Africa

  • Annual photographic and media competition.

  • Celebrates all that is quintessential about the African continent.

  • Each year, submitted images illustrate a specific theme that highlights individual passions and daily encounters, collective idiosyncrasies and universal humanity.
  • Run on a continental level with local events organised by participating usergroups and volunteers.

42 of 63

2014: Cuisine

2015: Cultural Fashion and Adornment

2016: Music and Dance

2017: People at Work

2019: Play!

2014: Cuisine

1st prize winner

43 of 63

2014: Cuisine

 873 people contributed

• 6,116 photographs entered

• 2,420,791 page views

• 969 images used (15.44%)March ‘18

2015: Cultural Fashion and Adornment

• 722 people contributed

• 7,500 photographs entered

• 10,482,196 page views

• 609 images used (8.25%) March ‘18

2015: Cultural Fashion & Adornment

1st prize winner

44 of 63

2014: Music and Dance

• 836 people contributed

• 7844 photographs entered

• 1,206,727 page views

• 319 images used (4.04%) March ‘18

2017: People at Work

• 2,473 people contributed

• 18,294 photographs entered

• 2,929,945 page views

• 1,124 images used (6.14%). March ‘18

2016: Music & Dance

1st prize winner

45 of 63

Challenges

Majority of users are first time:

  • File names are tricky
  • Descriptions are difficult
  • Categorisation is completely foreign and therefore usage is relatively low unless actively tasked by a member of the team (some kind of AI would definitely help here ;-))
  • Images are “dumped” with the country tying images together, but nothing else.
  • Cleaning up what suits criteria is manual and normally done by one person.
  • Small teams - very little local team input (confidence)
  • Photo Essays - keeping them together and automatically representing
  • Need to curate the process more ...

2017: People at Work

1st prize winner

46 of 63

Berria infographics

How we can reuse newspapers’ content

47 of 63

  • Berria newspaper now is publishing the web version under cc-by-sa.
  • The have published more than 16.000 infographics in their history.
  • Now they are uploading them to Commons, with the tags they use in their newspaper (i.e. Africa).

48 of 63

  • Everything is uploaded in svg format, so i18n is easier.
  • Lots of infographics are about very specific subjects.

(i.e. unemployment in the Basque Country in August 2015)

  • Some others are highly valuable for everyone.

49 of 63

  • Tags and topics can benefit broadly using P180.
  • Language translations can be easier once they start to be better organized.
  • Infographics have text, let’s dream a future with automatic svg translation from Wikidata entities.

50 of 63

Digitized publications on Commons from the Indian Wikimedia communities

51 of 63

Statistics of Indic Wikisource

  • 11 languages

  • 123 readable books

(As of 26 Feb 2018)

52 of 63

Workflow

  • Copyright free/PD books
  • Digitization
  • Wikimedia Commons
  • Wikisource
    • OCR
    • Proofreading
    • Validating

53 of 63

Leo za1 CC-BY-SA 3.0

Steps for Indexing the books on Wikisource

54 of 63

Structured data and Wikisource

  • It will make the Indexing of the books on Wikisource more easier. Will save more time.
  • Extract the data related to the work of the author/publisher
  • Activate Listeria bot when a new file will be added to Commons it will auto-update the list.

55 of 63

Structured data and Wikisource

  • Extract the data as you want to show
  • Customise it according to the interest of local readers.
  • Digital catalogue.

Images from State Government of Odisha

List by: https://en.wikipedia.org/wiki/User:MKar/List/Participants_of_2017_Asian_Athletics_Championships

56 of 63

Design challenge workshop

How can multilingual structured metadata bring knowledge equity to Commons?

Wikimania 2018, Cape Town

57 of 63

Let's workshop together!

58 of 63

We work together �in the etherpad!

59 of 63

60 of 63

  1. How do you think �structured data on Commons �can advance knowledge equity?

Broad ideas, or more concrete �project ideas and proposals

61 of 63

2. Read everyone's ideas.

Feel free to structure them.

Add to your favorite ideas (3 of each):

e (3x) - this idea advances knowledge equity most

f (3x) - this idea is most feasible

62 of 63

What are the favorite ideas?

Let's discuss.

Who wants to work on them?

e - equity

f - feasible

Leo za1 CC-BY-SA 3.0

63 of 63

Enkosi!

https://en.wikipedia.org/wiki/User:MKar/List/Participants_of_2017_Asian_Athletics_Championships