JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 8

Protonx Final Project

Wav2vec 2.0 for ASR

Nguyễn Đức Huy

20/08/2023

1

2 of 8

ASR pipeline

Embeddings

Speech signal

Decode Algorithm

Acoustic Model

(wav2vec2)

Pronunciation Vocabulary

Language Model

(n-gram)

Text

2

3 of 8

Tối ưu mô hình

Mô hình wav2vec 2.0 Base:

94 triệu tham số
7 lớp CNN + 12 lớp Transformers

Hướng tối ưu:

Model Distillation

Student: Đào tạo mô hình wav2vec nhỏ hơn, giảm còn 6 lớp Transformers
Teacher là mô hình wav2vec Base

3

4 of 8

Kết quả tối ưu mô hình

	Model	#Parameters	Model size	CPU Inference time	GPU Inference time	WER
Original	w/o LM	94M	378 MB	1157.66 s	111.04 s	0.264
Original	with LM	94M	378 MB		373.05 s	0.204
Distilled	w/o LM	51.9M	198 MB	810.70 s	89.90 s	0.175
Distilled	with LM	51.9M	198 MB		169.51 s	0.112

Môi trường: Google Colab

GPU: 1 x Tesla T4

4

5 of 8

Triển khai mô hình

Back-end

Mô hình chuyển về ONNX
Sử dụng Triton Server

Deploy lên cloud - GCP

Containerized (Docker)
Auto scaling
Load balancing

5

6 of 8

Triển khai mô hình

Mô hình được lưu ở Google Cloud Storage
Container Triton Server được lưu ở Google Container Registry

asia.gcr.io/protonx-asr/triton-asr-v1:23.05-py3 (private)

6

7 of 8

Triển khai mô hình

Deploy lên cloud - GCP

Auto scaling
Load balancing

7

8 of 8

Kết quả triển khai

4 CPU

1 GPU Tesla T4

8