Suppr超能文献

强调未见过的单词:用于端到端语音识别的新词汇习得

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition.

作者信息

Qu Leyuan, Weber Cornelius, Wermter Stefan

机构信息

Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany; Department of Artificial Intelligence, Zhejiang Laboratory, Hangzhou, China.

Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany.

出版信息

Neural Netw. 2023 Apr;161:494-504. doi: 10.1016/j.neunet.2023.01.027. Epub 2023 Feb 10.

Abstract

Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-processing, we tackle this problem on an earlier processing level and eliminate the bias in acoustic modeling to recognize OOV words acoustically. We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words. Specifically, we enlarge the classification loss used for training neural networks' parameters of utterances containing OOV words (sentence-level), or rescale the gradient used for back-propagation for OOV words (word-level), when fine-tuning a previously trained model on synthetic audio. To overcome catastrophic forgetting, we also explore the combination of loss rescaling and model regularization, i.e. L2 regularization and elastic weight consolidation (EWC). Compared with previous methods that just fine-tune synthetic audio with EWC, the experimental results on the LibriSpeech benchmark reveal that our proposed loss rescaling approach can achieve significant improvement on the recall rate with only a slight decrease on word error rate. Moreover, word-level rescaling is more stable than utterance-level rescaling and leads to higher recall rates and precision rates on OOV word recognition. Furthermore, our proposed combined loss rescaling and weight consolidation methods can support continual learning of an ASR system.

摘要

由于人类语言的动态特性,自动语音识别(ASR)系统需要不断获取新词汇。词汇表外(OOV)的单词,如热门词汇和新的命名实体,给现代ASR系统带来了问题,这些系统需要很长的训练时间来调整其大量参数。与以往大多数专注于语言模型后处理的研究不同,我们在更早的处理阶段解决这个问题,并消除声学建模中的偏差,以便从声学上识别OOV单词。我们建议使用文本转语音系统生成OOV单词,并重新调整损失,以鼓励神经网络更多地关注OOV单词。具体来说,当在合成音频上微调先前训练的模型时,我们扩大用于训练包含OOV单词的话语(句子级)的神经网络参数的分类损失,或者重新调整用于OOV单词反向传播的梯度(单词级)。为了克服灾难性遗忘,我们还探索了损失重新调整和模型正则化的组合,即L2正则化和弹性权重巩固(EWC)。与仅使用EWC微调合成音频的先前方法相比,在LibriSpeech基准测试上的实验结果表明,我们提出的损失重新调整方法可以在召回率上取得显著提高,而单词错误率仅略有下降。此外,单词级重新调整比句子级重新调整更稳定,并且在OOV单词识别上导致更高的召回率和精确率。此外,我们提出的损失重新调整和权重巩固相结合的方法可以支持ASR系统的持续学习。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验