ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARASATAUAV
1
URLGroup1Group2Group3Group4Group5Group6
2
Existing dataset in ESPnet2
3
aidatatang_200zh (ZH)http://www.openslr.org/resources/62rishubhg, kchian, zhenyub
4
aishell (ZH)http://www.aishelltech.com/kysjcp
5
babel (~20 languages)https://www.iarpa.gov/index.php/research-programs/babel
6
commonvoice (13 languages and beyond)https://voice.mozilla.org/datasetsdberrebb, jhuynh, kayoyemwillia, csking
7
csj (JP)https://pj.ninjal.ac.jp/corpus_center/csj/en/
8
hkust (ZH)https://catalog.ldc.upenn.edu/LDC2005S15
9
iwslt21_low_resource (SW)http://www.openslr.org/25/ https://catalog.ldc.upenn.edu/LDC2017S05 https://gamayun.translatorswb.org/data/ https://iwslt.org/2021/low-resource
10
jsut (JP)https://sites.google.com/site/shinnosuketakamichi/publication/jsutcdeveral, shajomar, shravyabrishubhg, kchian, zhenyub
11
jtubespeech (JP)
12
jv_openslr35 (JV)http://www.openslr.org/35
13
ksponspeech (KR)https://aihub.or.kr/aidata/105
14
laborotv (JP)https://laboro.ai/column/eg-laboro-tv-corpus-jp
15
mls (8 languages)http://www.openslr.org/94/
16
open_li52 (ASR 52 languages)
17
polyphone_swiss_french (FR)http://catalog.elra.info/en-us/repository/browse/ELRA-S0030_02
18
primewords_chinese (ZH)https://www.openslr.org/47/Yu Zhong, Phoebe Lixding2, zhihaow2, qibinc
19
puebla_nahuatl (HPN)https://www.openslr.org/92/
20
ru_open_stt (RU)https://github.com/snakers4/open_stt
21
su_openslr36 (SU)http://www.openslr.org/36
22
totonac (Totonac)http://www.openslr.org/107/
23
vivos (VI)https://ailab.hcmus.edu.vn/vivos/
24
voxforge (7 languages)http://www.voxforge.org/
25
wenetspeech (ZH)https://wenet-e2e.github.io/WenetSpeech/rishubhg, kchian, zhenyub
26
yesno (HE)http://www.openslr.org/107/
27
yoloxochitl_mixtec (Yoloxochil-Mixtec)http://www.openslr.org/89
28
zeroth_korean (KR)http://www.openslr.org/40
29
30
31
DatasetURLGroup1Group2Group3Group4
32
Other datasets
33
NISP (indian accent ASR)https://arxiv.org/pdf/2007.06021v1.pdf
34
aishell2https://github.com/espnet/espnet/tree/master/egs
35
hub4_spanishhttps://github.com/espnet/espnet/tree/master/egs
36
openasr20https://sat.nist.gov/openasr20
37
vystadial recipehttp://www.openslr.org/6/
38
thuyg-20http://www.openslr.org/22/
39
Ibanhttp://www.openslr.org/24/cdeveral, shajomar, shravyab
40
ALFFAhttp://www.openslr.org/25/
41
free_st_mandarinhttp://www.openslr.org/38/cxcui, yunhsua3, haiwengxding2, zhihaow2, qibinc
42
neroicohttp://www.openslr.org/39/cdeveral, shajomar, shravyab
43
zeroth_koreanhttp://www.openslr.org/46/
44
tunisian_msa
45
primewords_chinesehttp://www.openslr.org/47/szchang
46
sinhala_openslr52http://www.openslr.org/52/
47
bengali_openslr52http://www.openslr.org/53/sakter, rdutt, bguda
48
nepali_openslr54http://www.openslr.org/54/
49
acceted_french_openslr56http://www.openslr.org/57/suminpar, yerinh, mvijaydberrebb, jhuynh, kayoy
50
pansori_ted_x_krhttp://www.openslr.org/58/cdeveral, shajomar, shravyab
51
parlament_parlahttp://www.openslr.org/59/
52
tedx_spanishhttp://www.openslr.org/67/skathpal, apsharma, vayudian
53
magicdata_mandarin_read_speechhttp://www.openslr.org/68/njanders, taiqih
54
russian_librispeechhttp://www.openslr.org/96/
55
kschttp://www.openslr.org/102/
56
nicolingua_0003_african_radiohttp://www.openslr.org/105/
57
nicolingua_0004_african_vahttp://www.openslr.org/106/
58
mediaspeechhttp://www.openslr.org/108/asrivas4, pmjoshi, sameerjhfu2,wangchew,yaushiaw
59
samromur_2105http://www.openslr.org/112/
60
seoul_corpushttp://www.openslr.org/113/
61
goloshttp://www.openslr.org/114/
62
IISc-MILE Kannada ASR Corpushttp://openslr.org/126/
63
64
65
DatasetInformation (URL, reference, etc.)Group1Group2Group3Group4
66
Proposed datasets
67
LDCIL ()https://data.ldcil.org/index.php?route=common/homesujayk, lagupta, amahabalsakter, rdutt, bgudasumita, surajt, nyarrabe
kmahajan, sphal, nikhilgu
68
kannadahttps://openslr.org/79/sujayk, lagupta, amahabal
69
marathihttps://openslr.org/64/sujayk, lagupta, amahabalkmahajan, sphal, nikhilguasrivas4, pmjoshi, sameerj
70
Microsoft Speech Corpus (Telugu)https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985esiddhana, aostapen, cnarisetsakter, rdutt, bgudasumita, surajt, nyarrabe
salonim, ktodi, kimihiro
71
Dsing30https://www.isca-speech.org/archive/pdfs/interspeech_2019/dabike19_interspeech.pdfjiatongs, fangzhex, xinyuech
72
MInDS-14 (Korean)https://arxiv.org/abs/2104.08524youngmik, kalvinc, karthikg
73
burmese_openslr80https://openslr.org/80/asrivas4, pmjoshi, sameerj
74
malayalam_openslr63https://www.openslr.org/63/buk, pup, rmampill
75
commonvoice Guaraníhttps://commonvoice.mozilla.org/en/datasetsnrrobins, aogayo
76
Puerto Rican Spanishhttp://www.openslr.org/74/asrivas4, pmjoshi, sameerj
77
Microsoft Speech Corpus (Tamil)https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985esalonim, ktodi, kimihiro
78
Microsoft Speech Corpus (Gujarati)https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985esalonim, ktodi, kimihirosujayk, lagupta, amahabal
79
Gujaratihttp://openslr.org/78/sujayk, lagupta, amahabal
80
Sichuan Dialect Scripted Speech Corpushttps://magichub.com/datasets/sichuan-dialect-scripted-speech-corpus-daily-use-sentence/schen4, zhiruow
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100