Algorithms that Recognizes Uncommon Spoken Languages
Cheng-I Jeff Lai
MIT Horizon 2022
A Challenge for Automatic Speech Recognition
A Challenge for Automatic Speech Recognition
Conventional Automatic Speech Recognition
Towards End-to-End Speech Recognition (Li et al. ISCSLP Tutorial 4 2018)
Building a Speech Recognizer with 10min of data
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition (Yu et al. arxiv 2020)
Core Technology: Self-Supervised Pre-Training
Raw speech waveforms
Core Technology: Fine-Tuning with Minimal Data
Core Technology: Transformer Architecture
Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al. NeurIPS 2020)
Scaling up to 53 Languages Speech Recognition
Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al. Interspeech 2021)
Scaling up to 128 Languages ASR and Speech Translation
XLS-R: Self-Supervised Cross-Lingual Speech Representation Learning at Scale (Babu et al. arxiv 2021)
Efficiency and Universal Benchmark for Self-Supervised Learning
PARP: Prune, Adjust, Re-Prune for Self-Supervised Speech Recognition (Lai et al. NeurIPS 2021)
Efficiency and Universal Benchmark for Self-Supervised Learning
SUPERB: Speech processing Universal PERformance Benchmark (Yang et al. Interspeech 2021)
An Open-sourced & Ongoing Effort
Toolkits on Github:
Ongoing Open Challenges:
An Open-sourced & Ongoing Effort
Thank you!�clai24@mit.edu�