Evaluation Scores of Google Translate in 102 Languages
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

View only
 
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Source data for Teach You Backwards: An In-Depth Study of Google Translate for 103 Languageshttp://teachyoubackwards.com
2
The tables in this file report the scores of translations of the same 20 English phrases to all 102 languages in Google Translate, as rated by native or highly-competent speakers.
3
The data in this file is free to use according to the terms of the CC-by license at https://creativecommons.org/licenses/by/4.0/legalcode, with credit to Martin Benjamin, Kamusi Project International, 2018
4
Evaluators rated each translation on a scale of A, B, or C. (A = good translation, B = not correct, but someone could understand the idea, C = completely wrong)
original English phrase
English explanation shown to evaluatorsBardTarzanFail
5
The original English phrases given to Google Translate, and the English explanations given to the evaluators, are shown in the light purple table on the right. Click here for itemized translations and scores in all languages1fly out of Londontake an airplane from London35.20%79.00%21.00%
6
Methodology is discussed in the black boxes on the right.Click here for language-to-language confidence scores (Tarzan-to-Tarzan intelligibility ratings between non-English pairs)2like a bat out of hellescaping as quickly as possible2.40%23.80%76.20%
7
3out coldunconscious3.80%9.50%90.50%
8
Bard is a weighted ranking that indicates the proportion of translations that were judged close to human quality. Responses scored A = 5, B = 2.5, C = 0. Maximum score = 100.4out of boundsunacceptable6.70%41.00%59.00%
9
Tarzan indicates the percentage of times that translations could be understood, regardless of whether they were judged as human quality. Responses scored A = 5, B = 5, C= 0. Maximum score = 100.5out of breathgasping for air (for example, after running)34.80%64.80%35.20%
10
Fail is the percentage of times that translations were judged as completely wrong. Responses scored A = 0, B = 0, C = 5. Maximum score = 100.6out of curiousitybecause a person is casually interested in something33.30%65.70%34.30%
11
7out of focusnot clear to see (blurry)21.90%49.50%50.50%
12
Languages in blue are those that this page says run on the Neural Machine Translation Model: https://cloud.google.com/translate/docs/languagesWhite = Phrase-Based Machine Translation Model.8out of his mindcrazy4.80%28.60%71.40%
13
Languages in orange are those that this page says run on the Neural Machine Translation Model: https://translate.google.com/intl/en/about/languages/9out of milkthe supply of milk is finished4.30%19.00%81.00%
14
Languages in bold are among world's top 100 (https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers)10out of orderdoes not function (broken)31.90%50.50%49.50%
15
Green-headed table shows GT results for top 100 languages by number of native speakers
Dark-blue-headed table shows GT results for languages not in top 10011out of pocketpaid for something from personal money15.20%50.50%49.50%
16
Spanish was evaluated separately for SPain and Latin America. Portuguese was evaluated separately for PorTugal, Cape Verde, and BRazil.Traditional and Simplified Chinese were evaluated as one.12out of steamno more energy (exhausted)3.30%10.50%89.50%
17
13out of styleunfashionable21.90%64.80%35.20%
18
AlphabeticalBardTarzanFailBard RankingTarzan Ranking% Fail Ranking14out of the closetopenly homosexual9.50%24.80%75.20%
19
1Afrikaans67.587.512.51Afrikaans67.51Afrikaans87.51Bengali10015out of the gameno longer participating in a game32.40%68.60%31.40%
20
2Albanian26.2540602German602German82.51Haitian Creole10016out of the officeaway from the office46.70%82.90%17.10%
21
3Amharic3040602
Portuguese-PT
603
Portuguese-BR
801Tajik10017out of this worldexcellent3.80%17.10%82.90%
22
4Arabic32.540603Spanish-SP57.53
Portuguese-CV
802Kurdish9518out of timea deadline has passed14.80%63.80%36.20%
23
5Armenian2540604Polish56.254Spanish-LA752Nepali9519out of wedlockbetween partners who are not married30.00%61.00%39.00%
24
6Azerbaijani37.555455Chinese554Spanish-SP753Latin9020out on the town
having a fun time going shopping or to bars/ restaurants (carousing)
4.30%21.90%78.10%
25
7Basque37.547.552.55Croatian555Polish72.53Malaysian90
26
8Belarusian4055455Spanish-LA556Danish703Urdu90Methodology:
27
9Bengali001006Dutch52.56Greek704Maori85
28
10Bosnian3040606Galician52.57Chinese655Cebuano80We translated 20 common English phrases to 102 languages, and then sent a re-worded explanation of the phrase to independent evaluators. We did not send the original English phrase, because early testing showed that knowing the actual words in question influenced the way that evaluators understood the query. For example, if we showed "out of steam" along with its explanation, some evaluators would see that a word for "steam" was given and therefore judge the translation highly, whereas if they only saw the explanation "no more energy (exhausted)", they were able to judge whether the proposed translation captured the usual English meaning.
29
11Bulgarian4060406Greek52.57Croatian655Georgian80
30
12Catalan37.560406
Portuguese-BR
52.57Dutch655Persian80
31
13Cebuano12.520806
Portuguese-CV
52.57Finnish655Punjabi80
32
14Chichewa17.530707Italian507Hungarian655Uzbek80
33
15Chinese5565357Latvian507
Portuguese-PT
656Xhosa77.5
34
16Corsican22.535658Indonesian47.58Welsh62.57Hawaiian75
35
17Croatian5565358Hungarian47.59Indonesian607Javanese75
36
18Czech43.7555458Igbo47.59Bulgarian607Lao75We tested clusters of two or more words that often occur together, and that have a meaning that is generally discernable when they do (as can be readily seen in Twitter searches); for example, tweets with “out cold” almost always imply unconsciousness. We did not test single words, which can be highly ambiguous (for example, right = correct, legal entitlement, politically conservative, not left, etc.), and therefore too arbitrary in isolation. Although GT does not advertise itself as a dictionary, single-word lookups are a major proportion of real world uses of the service. Nor did we test full sentences, which add a lot of complexity to scoring, since translations might include both correct and incorrect elements; moreover, humans can translate the same sentence many ways, making it impossible to impose a gold standard for full sentences, especially across dozens of languages. It should be noted that GT alters its vocabulary choice on the fly, so the words chosen to translate a phrase may not be those selected in a longer sentence; for example, “run of the mill” is represented in French by “course du moulin” in isolation, and “course de l’usine” when translating a longer tweet (both completely wrong), and other instances might produce other results.
37
19Danish4070308Serbian47.59Catalan607
Myanmar/ Burmese
75
38
20Dutch52.565359Finnish459French607Pashto75
39
21Esperanto4055459French459Galician607Samoan75
40
22Estonian27.545559Hebrew459Igbo607Swahili75
41
23Filipino2535659Swedish459Italian607Thai75
42
24Finnish45653510Czech43.759Kazakh607Yoruba75
43
25French45604010Welsh43.759Latvian608Chichewa70
44
26Frisian30406011Japanese42.59Macedonian608Hindi70
45
27Galician52.5604011Kazakh42.59Maltese608Icelandic70
46
28Georgian10208011Korean42.59Sesotho608Lithuanian70
47
29German6082.517.511Malaysian42.59Slovenian608Marathi70All of the clusters contained the word “out”. This word is ubiquitous in English texts. It is extremely ambiguous in isolation, but often occurs in clusters with unmistakable meanings. WordReference.com gives definitions and French equivalents for nearly 1700 composed expressions that include “out”, from “a fish out of water” to “zoom out” (http://www.wordreference.com/enfr/out?start=1600). We chose 20 formulations that are lexicalized in WordReference as composed forms such as “out of style”, or that, as queried on Twitter, usually reduce to defined meanings when matched with other particular words, such as “out of milk”. The premise is that all of these items have been translated in an electronic dictionary, so are thus similarly viable as units for machine translation.
48
30Greek52.5703011Romanian42.59Swedish608Shona70
49
31Gujarati30455512Belarusian4010Azerbaijani558Somali70
50
32
Haitian Creole
0010012Bulgarian4010Belarusian558Zulu70
51
33Hausa27.5356512Danish4010Czech559Kyrgyz67.5
52
34Hawaiian15257512Esperanto4010Esperanto5510Corsican65
53
35Hebrew45455512Mongolian4010Japanese5510Filipino65
54
36Hindi22.5307012Russian4010Korean5510Hausa65
55
37Hmong27.5406012Scots Gaelic4010Romanian5510Luxembourgish65
56
38Hungarian47.5653512Slovenian4010Serbian5510Tamil65The expressions were not chosen to be especially simple or difficult, nor based on corpus frequency. Rather, they were chosen because they had clear meanings, and are broadly representative of the types of phrase that ordinary users are likely to seek to translate. The selection (see above) is therefore not rigidly scientific, and we leave it to the reader to decide whether the items provide a fair test of MT capability. The least-recognized phrase, “out cold”, resulted in just 3 “A” ratings (Hausa, Hindi, and Malaysian) and was only understandable to some extent 10 times, while the phrase “out of the office” produced an understandable result 83% of the time.
57
39Icelandic20307012Turkish4011Malayalam5010Telugu65
58
40Igbo47.5604013Azerbaijani37.511Mongolian5010Vietnamese65
59
41Indonesian47.5604013Basque37.511Russian5011Albanian60
60
42Irish25455513Catalan37.511Scots Gaelic5011Amharic60
61
43Italian50604014Macedonian3511Turkish5011Arabic60
62
44Japanese42.5554514Maltese3511Yiddish5011Armenian60
63
45Javanese17.5257514Sinhala3512Basque47.511Bosnian60Scores are not absolute, for two reasons. First, the choice of expressions was arbitrary. A different selection of English expressions would generate different numerical results within each language; for example, scores would probably fall were a larger number of idioms to be included. However, relative results would likely remain the same; a language with high scores for our test set should perform highly with other data, while a language with low scores herein would have similarly low results with other input. Second, most language scores show the subjective opinion of a single reviewer. One could well argue that more reviewers per language would produce more reliable data. We had several languages with multiple reviewers, and found that inter-annotator disagreements were usually minor, with a handful of entries per language being judged good versus marginal, or marginal versus wrong. A single entry being ranked by different evaluators as good versus marginal does not change the Tarzan score and changes the Bard score by 2.5 points, and a disagreement between marginal and wrong changes both Tarzan and Bard by 2.5. We averaged the scores where annotators disagreed, and have kept disagreements visible in the full data release, in the "Itemized Translations and Scores" tab at the bottom of this page. Based on the inter-annotator disagreement values we discovered, the reader is advised to place mental error bars of ±10 around the score that is reported.
64
46Kannada20406015Arabic32.513Estonian4511Frisian60
65
47Kazakh42.5604015Sesotho32.513Gujarati4511Hmong60
66
48Khmer30455516Amharic3013Hebrew4511Kannada60
67
49Korean42.5554516Bosnian3013Irish4511Malagasy60
68
50Kurdish3.7559516Frisian3013Khmer4511Sindhi60
69
51Kyrgyz22.532.567.516Gujarati3013Norwegian4511Sundanese60
70
52Lao21.25356516Khmer3013Sinhala4511Ukrainian60
71
53Latin5109016Sindhi3013Slovak4512Estonian55
72
54Latvian50604016Ukrainian3014Albanian4012Gujarati55
73
55Lithuanian20307017Estonian27.514Amharic4012Hebrew55
74
56
Luxembourgish
27.5356517Hausa27.514Arabic4012Irish55
75
57Macedonian35604017Hmong27.514Armenian4012Khmer55
76
58Malagasy27.5406017
Luxembourgish
27.514Bosnian4012Norwegian55
77
59Malayalam42.5505017Malagasy27.514Frisian4012Sinhala55
78
60Malaysian5109017Norwegian27.514Hmong4012Slovak55
79
61Maltese35604017Slovak27.514Kannada4013Basque52.5
80
62Maori10158517Yiddish27.514Malagasy4014Malayalam50
81
63Marathi15307018Albanian26.2514Sindhi4014Mongolian50
82
64Mongolian40505019Armenian2514Sundanese4014Russian50
83
65
Myanmar/ Burmese
12.5257519Filipino2514Ukrainian4014Scots Gaelic50
84
66Nepali2.559519Irish2515Corsican3514Turkish50
85
67Norwegian27.5455520Corsican22.515Filipino3514Yiddish50
86
68Pashto20257520Hindi22.515Hausa3515Azerbaijani45
87
69Persian10208020Kyrgyz22.515Lao3515Belarusian45
88
70Polish56.2572.527.520Sundanese22.515
Luxembourgish
3515Czech45
89
71
Portuguese-BR
60802020Telugu22.515Tamil3515Esperanto45
90
72
Portuguese-CV
52.5802021Lao21.2515Telugu3515Japanese45
91
73
Portuguese-PT
52.5653522Icelandic2015Vietnamese3515Korean45
92
74Punjabi10208022Kannada2016Kyrgyz32.515Romanian45
93
75Romanian42.5554522Lithuanian2017Chichewa3015Serbian45
94
76Russian40505022Pashto2017Hindi3016Bulgarian40
95
77Samoan15257522Tamil2017Icelandic3016Catalan40
96
78Scots Gaelic40505022Vietnamese2017Lithuanian3016French40
97
79Serbian47.5554523Chichewa17.517Marathi3016Galician40
98
80Sesotho32.5604023Javanese17.517Shona3016Igbo40
99
81Shona17.5307023Shona17.517Somali3016Italian40
100
82Sindhi30406023Uzbek17.517Zulu3016Kazakh40
Loading...