1 of 13

Advancing AI with Speech: A Call for Tajik Contributions to Mozilla Common Voice

GDG Khujand

Muhammad Abdugafarov

December 21, 2024

2 of 13

GDG Khujand

Agenda

  • Introduction
  • Latest Trends in AI
  • The Case for SpeechAI
  • Challenges for Minority Languages
  • How we can solve this?
  • Common Voice Dataset

3 of 13

Why AI and Speech Matter Today

  • AI is transforming how we live and work globally
  • SpeechAI provides a natural way for humans to communicate with technology
  • Tajik developers have a unique opportunity to make a global impact

GDG Khujand

4 of 13

AI Trends Shaping the Future

  • Dominance of Large Language Models (LLMs) in various sectors
  • Rise of multimodal AI combining text, images, and audio
  • Growing demand for more natural interfaces like speech-based AI

5 of 13

Why Speech Matters in AI?

  • Speech is the most intuitive form of human communication.
  • It enhances accessibility and inclusivity (e.g., for visually impaired users).
  • SpeechAI is paving the way for voice assistants, real-time translations, and more.

GDG Khujand

6 of 13

The Minority Language Gap

  • Most AI systems support major languages but exclude many like Tajik.
  • Lack of labeled datasets is the main barrier.
  • Without data, AI cannot learn or support these languages.

7 of 13

Closing the Gap for Tajik SpeechAI

  • Developers can help by contributing to open-source platforms like Mozilla Common Voice
  • Collecting and validating speech data ensures Tajik is represented in AI systems
  • Your contributions can enable tools like speech-to-text, language learning, and accessibility apps

GDG Khujand

8 of 13

  • An open-source initiative to collect diverse speech datasets
  • Supports over 100 languages, including minority ones
  • Provides tools for recording, validating, and managing speech data

9 of 13

10 of 13

STATS

11 of 13

Join Common Voice

Tajik Contributors

12 of 13

Let’s help AI to speak in Tajik

13 of 13

Useful links