Project Background
Generate both text and speech data in South Africa’s official languages
Open access, public domain datasets
Language data from domains other than the government domain (NCHLT Text Corpus)
Develop digital presence for under-resourced languages (Tshivenda, Xitsonga etc.)
Empower language communities
Establish a point of departure for language projects and technologies
Mozilla Common Voice
How does Common Voice work?
How does Common Voice work?
For Researchers
https://github.com/common-voice/common-voice
https://common-voice.github.io/community-playbook/sub_pages/mobilization.html