This website requires JavaScript.

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Yanzhe FuYueteng KangSongjun CaoLong Ma
Mar 2023
摘要
Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speechrecognition (ASR). However, the large model size and the non-streamingarchitecture make it hard to be used under low-resource or streaming scenarios.In this work, we propose a two-stage knowledge distillation method to solvethese two problems: the first step is to make the big and non-streaming teachermodel smaller, and the second step is to make it streaming. Specially, we adoptthe MSE loss for the distillation of hidden layers and the modified LF-MMI lossfor the distillation of the prediction layer. Experiments are conducted onGigaspeech, Librispeech, and an in-house dataset. The results show that thedistilled student model (DistillW2V2) we finally get is 8x faster and 12xsmaller than the original teacher model. For the 480ms latency setup, theDistillW2V2's relative word error rate (WER) degradation varies from 9% to23.4% on test sets, which reveals a promising way to extend the W2V2'sapplication scope.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答