Evaluation Scores of Google Translate in 102 Languages
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
View only
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
The tables in this file report the scores of translations of the same 20 English phrases to all 102 languages in Google Translate, as rated by native or highly-competent speakers.
2
Evaluators rated each translation on a scale of A, B, or C. (A = good translation, B = not correct, but someone could understand the idea, C = completely wrong)
original English phrase
English explanation shown to evaluators
3
The original English phrases given to Google Translate, and the English explanations given to the evaluators, are shown in the light purple table on the right. Click here for itemized translations and scores in all languages1fly out of Londontake an airplane from London
4
Methodology is discussed in the black boxes on the right.Click here for language-to-language confidence scores (Tarzan-to-Tarzan intelligibility ratings between non-English pairs)2like a bat out of hellescaping as quickly as possible
5
3out coldunconscious
6
Bard is a weighted ranking that indicates the proportion of translations that were judged close to human quality. Responses scored A = 5, B = 2.5, C = 0. Maximum score = 100.4out of boundsunacceptable
7
Tarzan indicates the percentage of times that translations could be understood, regardless of whether they were judged as human quality. Responses scored A = 5, B = 5, C= 0. Maximum score = 100.5out of breathgasping for air (for example, after running)
8
Fail is the percentage of times that translations were judged as completely wrong. Responses scored A = 0, B = 0, C = 5. Maximum score = 100.6out of curiousitybecause a person is casually interested in something
9
7out of focusnot clear to see (blurry)
10
Languages in blue are those that this page says run on the Neural Machine Translation Model: https://cloud.google.com/translate/docs/languagesWhite = Phrase-Based Machine Translation Model.8out of his mindcrazy
11
Languages in orange are those that this page says run on the Neural Machine Translation Model: https://translate.google.com/intl/en/about/languages/9out of milkthe supply of milk is finished
12
Languages in bold are among world's top 100 (https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers)10out of orderdoes not function (broken)
13
Green-headed table shows GT results for top 100 languages by number of native speakers
Dark-blue-headed table shows GT results for languages not in top 10011out of pocketpaid for something from personal money
14
Spanish was evaluated separately for SPain and Latin America. Portuguese was evaluated separately for PorTugal, Cape Verde, and BRazil.Traditional and Simplified Chinese were evaluated as one.12out of steamno more energy (exhausted)
15
13out of styleunfashionable
16
AlphabeticalBardTarzanFailBard RankingTarzan Ranking% Fail Ranking14out of the closetopenly homosexual
17
1Afrikaans67.587.512.51Afrikaans67.51Afrikaans87.51Bengali10015out of the gameno longer participating in a game
18
2Albanian26.2540602German602German82.51Haitian Creole10016out of the officeaway from the office
19
3Amharic3040602
Portuguese-PT
603
Portuguese-BR
801Tajik10017out of this worldexcellent
20
4Arabic32.540603Spanish-SP57.53
Portuguese-CV
802Kurdish9518out of timea deadline has passed
21
5Armenian2540604Polish56.254Spanish-LA752Nepali9519out of wedlockbetween partners who are not married
22
6Azerbaijani37.555455Chinese554Spanish-SP753Latin9020out on the town
having a fun time going shopping or to bars/ restaurants (carousing)
23
7Basque37.547.552.55Croatian555Polish72.53Malaysian90
24
8Belarusian4055455Spanish-LA556Danish703Urdu90Methodology:
25
9Bengali001006Dutch52.56Greek704Maori85
26
10Bosnian3040606Galician52.57Chinese655Cebuano80We translated 20 common English phrases to 102 languages, and then sent a re-worded explanation of the phrase to independent evaluators. We did not send the original English phrase, because early testing showed that knowing the actual words in question influenced the way that evaluators understood the query. For example, if we showed "out of steam" along with its explanation, some evaluators would see that a word for "steam" was given and therefore judge the translation highly, whereas if they only saw the explanation "no more energy (exhausted)", they were able to judge whether the proposed translation captured the usual English meaning.
27
11Bulgarian4060406Greek52.57Croatian655Georgian80
28
12Catalan37.560406
Portuguese-BR
52.57Dutch655Persian80
29
13Cebuano12.520806
Portuguese-CV
52.57Finnish655Punjabi80
30
14Chichewa17.530707Italian507Hungarian655Uzbek80
31
15Chinese5565357Latvian507
Portuguese-PT
656Xhosa77.5
32
16Corsican22.535658Indonesian47.58Welsh62.57Hawaiian75
33
17Croatian5565358Hungarian47.59Indonesian607Javanese75
34
18Czech43.7555458Igbo47.59Bulgarian607Lao75We tested clusters of two or more words that often occur together, and that have a meaning that is generally discernable when they do (as can be readily seen in Twitter searches); for example, tweets with “out cold” almost always imply unconsciousness. We did not test single words, which can be highly ambiguous (for example, right = correct, legal entitlement, politically conservative, not left, etc.), and therefore too arbitrary in isolation. Although GT does not advertise itself as a dictionary, single-word lookups are a major proportion of real world uses of the service. Nor did we test full sentences, which add a lot of complexity to scoring, since translations might include both correct and incorrect elements; moreover, humans can translate the same sentence many ways, making it impossible to impose a gold standard for full sentences, especially across dozens of languages. It should be noted that GT alters its vocabulary choice on the fly, so the words chosen to translate a phrase may not be those selected in a longer sentence; for example, “run of the mill” is represented in French by “course du moulin” in isolation, and “course de l’usine” when translating a longer tweet (both completely wrong), and other instances might produce other results.
35
19Danish4070308Serbian47.59Catalan607
Myanmar/ Burmese
75
36
20Dutch52.565359Finnish459French607Pashto75
37
21Esperanto4055459French459Galician607Samoan75
38
22Estonian27.545559Hebrew459Igbo607Swahili75
39
23Filipino2535659Swedish459Italian607Thai75
40
24Finnish45653510Czech43.759Kazakh607Yoruba75
41
25French45604010Welsh43.759Latvian608Chichewa70
42
26Frisian30406011Japanese42.59Macedonian608Hindi70
43
27Galician52.5604011Kazakh42.59Maltese608Icelandic70
44
28Georgian10208011Korean42.59Sesotho608Lithuanian70
45
29German6082.517.511Malaysian42.59Slovenian608Marathi70All of the clusters contained the word “out”. This word is ubiquitous in English texts. It is extremely ambiguous in isolation, but often occurs in clusters with unmistakable meanings. WordReference.com gives definitions and French equivalents for nearly 1700 composed expressions that include “out”, from “a fish out of water” to “zoom out” (http://www.wordreference.com/enfr/out?start=1600). We chose 20 formulations that are lexicalized in WordReference as composed forms such as “out of style”, or that, as queried on Twitter, usually reduce to defined meanings when matched with other particular words, such as “out of milk”. The premise is that all of these items have been translated in an electronic dictionary, so are thus similarly viable as units for machine translation.
46
30Greek52.5703011Romanian42.59Swedish608Shona70
47
31Gujarati30455512Belarusian4010Azerbaijani558Somali70
48
32
Haitian Creole
0010012Bulgarian4010Belarusian558Zulu70
49
33Hausa27.5356512Danish4010Czech559Kyrgyz67.5
50
34Hawaiian15257512Esperanto4010Esperanto5510Corsican65
51
35Hebrew45455512Mongolian4010Japanese5510Filipino65
52
36Hindi22.5307012Russian4010Korean5510Hausa65
53
37Hmong27.5406012Scots Gaelic4010Romanian5510Luxembourgish65
54
38Hungarian47.5653512Slovenian4010Serbian5510Tamil65The expressions were not chosen to be especially simple or difficult, nor based on corpus frequency. Rather, they were chosen because they had clear meanings, and are broadly representative of the types of phrase that ordinary users are likely to seek to translate. The selection (see above) is therefore not rigidly scientific, and we leave it to the reader to decide whether the items provide a fair test of MT capability. The least-recognized phrase, “out cold”, resulted in just 3 “A” ratings (Hausa, Hindi, and Malaysian) and was only understandable to some extent 10 times, while the phrase “out of the office” produced an understandable result 83% of the time.
55
39Icelandic20307012Turkish4011Malayalam5010Telugu65
56
40Igbo47.5604013Azerbaijani37.511Mongolian5010Vietnamese65
57
41Indonesian47.5604013Basque37.511Russian5011Albanian60
58
42Irish25455513Catalan37.511Scots Gaelic5011Amharic60
59
43Italian50604014Macedonian3511Turkish5011Arabic60
60
44Japanese42.5554514Maltese3511Yiddish5011Armenian60
61
45Javanese17.5257514Sinhala3512Basque47.511Bosnian60Scores are not absolute, for two reasons. First, the choice of expressions was arbitrary. A different selection of English expressions would generate different numerical results within each language; for example, scores would probably fall were a larger number of idioms to be included. However, relative results would likely remain the same;a language with high scores for our test set should perform highly with other data, while a language with low scores herein would have similarly low results with other input. Second, most language scores show the subjective opinion of a single reviewer. One could well argue that more reviewers per language would produce more reliable data. We had several languages with multiple reviewers, and found that inter-annotator disagreements were usually minor, with a handful of entries per language being judged good versus marginal, or marginal versus wrong. A single entry being ranked by different evaluators as good versus marginal does not change the Tarzan score and changes the Bard score by 2.5 points, and a disagreement between marginal and wrong changes both Tarzan and Bard by 2.5. We averaged the scores where annotators disagreed, and will keep disagreements visible in the full public data release. Based on the inter-annotator disagreement values we discovered, the reader is advised to place mental error bars of ±10 around the score that is reported.
62
46Kannada20406015Arabic32.513Estonian4511Frisian60
63
47Kazakh42.5604015Sesotho32.513Gujarati4511Hmong60
64
48Khmer30455516Amharic3013Hebrew4511Kannada60
65
49Korean42.5554516Bosnian3013Irish4511Malagasy60
66
50Kurdish3.7559516Frisian3013Khmer4511Sindhi60
67
51Kyrgyz22.532.567.516Gujarati3013Norwegian4511Sundanese60
68
52Lao21.25356516Khmer3013Sinhala4511Ukranian60
69
53Latin5109016Sindhi3013Slovak4512Estonian55
70
54Latvian50604016Ukranian3014Albanian4012Gujarati55
71
55Lithuanian20307017Estonian27.514Amharic4012Hebrew55
72
56
Luxembourgish
27.5356517Hausa27.514Arabic4012Irish55
73
57Macedonian35604017Hmong27.514Armenian4012Khmer55
74
58Malagasy27.5406017
Luxembourgish
27.514Bosnian4012Norwegian55
75
59Malayalam42.5505017Malagasy27.514Frisian4012Sinhala55
76
60Malaysian5109017Norwegian27.514Hmong4012Slovak55
77
61Maltese35604017Slovak27.514Kannada4013Basque52.5
78
62Maori10158517Yiddish27.514Malagasy4014Malayalam50
79
63Marathi15307018Albanian26.2514Sindhi4014Mongolian50
80
64Mongolian40505019Armenian2514Sundanese4014Russian50
81
65
Myanmar/ Burmese
12.5257519Filipino2514Ukranian4014Scots Gaelic50
82
66Nepali2.559519Irish2515Corsican3514Turkish50
83
67Norwegian27.5455520Corsican22.515Filipino3514Yiddish50
84
68Pashto20257520Hindi22.515Hausa3515Azerbaijani45
85
69Persian10208020Kyrgyz22.515Lao3515Belarusian45
86
70Polish56.2572.527.520Sundanese22.515
Luxembourgish
3515Czech45
87
71
Portuguese-BR
60802020Telugu22.515Tamil3515Esperanto45
88
72
Portuguese-CV
52.5802021Lao21.2515Telugu3515Japanese45
89
73
Portuguese-PT
52.5653522Icelandic2015Vietnamese3515Korean45
90
74Punjabi10208022Kannada2016Kyrgyz32.515Romanian45
91
75Romanian42.5554522Lithuanian2017Chichewa3015Serbian45
92
76Russian40505022Pashto2017Hindi3016Bulgarian40
93
77Samoan15257522Tamil2017Icelandic3016Catalan40
94
78Scots Gaelic40505022Vietnamese2017Lithuanian3016French40
95
79Serbian47.5554523Chichewa17.517Marathi3016Galician40
96
80Sesotho32.5604023Javanese17.517Shona3016Igbo40
97
81Shona17.5307023Shona17.517Somali3016Italian40
98
82Sindhi30406023Uzbek17.517Zulu3016Kazakh40
99
83Sinhala35455523Yoruba17.518Hawaiian2516Latvian40
100
84Slovak27.5455523Zulu17.518Javanese2516Macedonian40
Loading...
Main menu