1 of 57

Story of one hundred thousand translations

Wikipedia Translation

Santhosh Thottingal

Wikimedia Language Engineering

Internationalization & Unicode Conference 40

2 of 57

3 of 57

Knowledge is better served�if you can understand it

https://secure.flickr.com/photos/adam_jones/5793940771/

4 of 57

English Wikipedia article coverage

http://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en

5 of 57

German Wikipedia article coverage

http://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en

6 of 57

French Wikipedia article coverage

http://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en

7 of 57

Polish Wikipedia article coverage

http://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en

8 of 57

Chinese Wikipedia article coverage

http://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en

9 of 57

Content overlap between languages of Wikipedia is small

Based on the analysis of one month of all edits to the top 46 language editions of Wikipedia by Scott A. Hale, University of Oxford.

More info at http://arxiv.org/abs/1312.0976

Big opportunities for translation

English

German

49%

10 of 57

7,102 languages in the world(ethnologue)

Demand

11 of 57

287 Wikipedia

Demand

7,102 languages in the world(ethnologue)

12 of 57

50%+ Monolingual users

Demand

287 Wikipedia

7,102 languages in the world(ethnologue)

13 of 57

Next billion online users in 5 years

Demand

50%+ Monolingual users

287 Wikipedia

7,102 languages in the world(ethnologue)

14 of 57

6500 new articles per day

Supply

15 of 57

Supply

70K contributors per month

6500 new articles per day

16 of 57

14K new accounts per month

Supply

70K contributors per month

6500 new articles per day

17 of 57

3 years�For at least 40K article in every language

It takes...

18 of 57

3 years�For at least 40K article in every language

It takes...

12 years

To double the size of wikipedia

19 of 57

Potential users are active

Over 15% of users edit multiple language editions.

These multilingual users are more active (2.3 times) than their monolingual counterparts on average.

Multilingual users made 30% of all edits.

Based on the analysis of one month of all edits to the top 46 language editions of Wikipedia by Scott A. Hale, University of Oxford.

More info at http://arxiv.org/abs/1312.0976

Main language pairs

20 of 57

Content translation is a beta feature

Available in Wikipedia in all languages

21 of 57

Workflow

Discover article lacking translation

Create translation draft

Publish as new article

Follow the current process but avoiding manual steps

Translation view

Where translations are made

Entry points

Ways to make users aware of the tool

22 of 57

Translation is not new for Wikipedia

23 of 57

Translation is not new for Wikipedia

24 of 57

Translation is not new for Wikipedia

25 of 57

26 of 57

Integration

27 of 57

Integration to contribution workflow

Hover menu on Contributions link

28 of 57

Integration to contribution workflow

My contributions page

29 of 57

Integration

Gray interlanguage link

30 of 57

The translation dashboard helps you keep track of your translations, continue ongoing translations and find articles to translate

31 of 57

Article and language selector

32 of 57

The translation tool

33 of 57

Machine Translation

34 of 57

Machine Translation

35 of 57

Misuse of MT?

Translation progress is measured.

36 of 57

Translation context

Current sentence is highlighted.�Content is segmented at sentence level

37 of 57

Formatting preserved

Many machine translation systems support only plain text translation.

38 of 57

Automatic link adaptation

Links can be added, removed or edited

39 of 57

One click image adaptation

40 of 57

Article categories

41 of 57

References

42 of 57

Translations always happen �at the target language wiki

Find source articles

Translate & Publish

43 of 57

Translations are auto saved, �can be resumed any time later

44 of 57

Link to source revision as attribution

Articles have ContentTranslation tag

45 of 57

Article Recommendations

46 of 57

Free Licensed Parallel corpora

Based on translations of wikipedia articles

https://dumps.wikimedia.org

47 of 57

Free Licensed Parallel corpora

<tu srclang="es"><tuv xml:lang="es"><prop type="origin">source</prop><seg>André Lotterer durante una tanda de entrenamientos de la Fórmula Nippon 2010 en Motegi.</seg></tuv><tuv xml:lang="ca"><prop type="origin">mt</prop><seg>André Lotterer durant una tanda d'entrenaments de la Fórmula Nippon 2010 en Motegi.</seg></tuv><tuv xml:lang="ca"><prop type="origin">user</prop><seg>André Lotterer durant una tanda d'entrenaments de la Fórmula Nippon 2010 a Motegi.</seg></tuv></tu>

48 of 57

Results

49 of 57

130,000+ new articles created

50 of 57

A new wikipedia article in every 5 minutes

51 of 57

52 of 57

53 of 57

54 of 57

Wikipedia’s coverage of essential vaccines is expanding

55 of 57

The Medical Translation Task Force

has boosted its productivity by 17% and

increased the amount of health care content by using the Content Translation tool.

Wikipedia’s coverage of essential vaccines is expanding

https://blog.wikimedia.org/2016/03/29/wikipedias-essential-vaccines/

56 of 57

Burmese, Malay, and Odia: An additional 136.5 million native language speakers get all 23 essential vaccines in their local-language

Thai, Romanian, and Yoruba…: An additional 71.7 million speakers now have access to at least a third of the 23 vaccine articles.

Access to health care information �in their local-language

57 of 57

@WhatToTranslate

mediawiki.org/wiki/CX