强调未见过的单词：用于端到端语音识别的新词汇习得

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition.

作者信息

Qu Leyuan, Weber Cornelius, Wermter Stefan

机构信息

Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany; Department of Artificial Intelligence, Zhejiang Laboratory, Hangzhou, China.

Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany.

出版信息

Neural Netw. 2023 Apr;161:494-504. doi: 10.1016/j.neunet.2023.01.027. Epub 2023 Feb 10.

DOI:10.1016/j.neunet.2023.01.027

PMID:36805264

Abstract

Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-processing, we tackle this problem on an earlier processing level and eliminate the bias in acoustic modeling to recognize OOV words acoustically. We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words. Specifically, we enlarge the classification loss used for training neural networks' parameters of utterances containing OOV words (sentence-level), or rescale the gradient used for back-propagation for OOV words (word-level), when fine-tuning a previously trained model on synthetic audio. To overcome catastrophic forgetting, we also explore the combination of loss rescaling and model regularization, i.e. L2 regularization and elastic weight consolidation (EWC). Compared with previous methods that just fine-tune synthetic audio with EWC, the experimental results on the LibriSpeech benchmark reveal that our proposed loss rescaling approach can achieve significant improvement on the recall rate with only a slight decrease on word error rate. Moreover, word-level rescaling is more stable than utterance-level rescaling and leads to higher recall rates and precision rates on OOV word recognition. Furthermore, our proposed combined loss rescaling and weight consolidation methods can support continual learning of an ASR system.

摘要

由于人类语言的动态特性，自动语音识别（ASR）系统需要不断获取新词汇。词汇表外（OOV）的单词，如热门词汇和新的命名实体，给现代ASR系统带来了问题，这些系统需要很长的训练时间来调整其大量参数。与以往大多数专注于语言模型后处理的研究不同，我们在更早的处理阶段解决这个问题，并消除声学建模中的偏差，以便从声学上识别OOV单词。我们建议使用文本转语音系统生成OOV单词，并重新调整损失，以鼓励神经网络更多地关注OOV单词。具体来说，当在合成音频上微调先前训练的模型时，我们扩大用于训练包含OOV单词的话语（句子级）的神经网络参数的分类损失，或者重新调整用于OOV单词反向传播的梯度（单词级）。为了克服灾难性遗忘，我们还探索了损失重新调整和模型正则化的组合，即L2正则化和弹性权重巩固（EWC）。与仅使用EWC微调合成音频的先前方法相比，在LibriSpeech基准测试上的实验结果表明，我们提出的损失重新调整方法可以在召回率上取得显著提高，而单词错误率仅略有下降。此外，单词级重新调整比句子级重新调整更稳定，并且在OOV单词识别上导致更高的召回率和精确率。此外，我们提出的损失重新调整和权重巩固相结合的方法可以支持ASR系统的持续学习。

相似文献

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition.

Neural Netw. 2023 Apr;161:494-504. doi: 10.1016/j.neunet.2023.01.027. Epub 2023 Feb 10.

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition.

Sensors (Basel). 2021 Apr 28;21(9):3063. doi: 10.3390/s21093063.

A normalization model for repeated letters in social media hate speech text based on rules and spelling correction.

PLoS One. 2024 Mar 21;19(3):e0299652. doi: 10.1371/journal.pone.0299652. eCollection 2024.

Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ).

Ear Hear. 2017 May/Jun;38(3):344-356. doi: 10.1097/AUD.0000000000000393.

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Comput Intell Neurosci. 2019 Mar 3;2019:5072918. doi: 10.1155/2019/5072918. eCollection 2019.

Development of Language Models for Continuous Uzbek Speech Recognition System.

Sensors (Basel). 2023 Jan 19;23(3):1145. doi: 10.3390/s23031145.

A Speech Recognition Method Based on Domain-Specific Datasets and Confidence Decision Networks.

Sensors (Basel). 2023 Jun 29;23(13):6036. doi: 10.3390/s23136036.

A neural network for 500 word vocabulary word spotting using non-uniform units.

Neural Netw. 2000 Jul;13(6):681-8. doi: 10.1016/s0893-6080(00)00030-7.

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research.

Lang Speech. 2021 Mar;64(1):35-51. doi: 10.1177/0023830920911079. Epub 2020 Mar 30.

Modular fuzzy-neuro controller driven by spoken language commands.

IEEE Trans Syst Man Cybern B Cybern. 2004 Feb;34(1):293-302. doi: 10.1109/tsmcb.2003.811511.

引用本文的文献

Multichannel speech enhancement for automatic speech recognition: a literature review.

PeerJ Comput Sci. 2025 Mar 27;11:e2772. doi: 10.7717/peerj-cs.2772. eCollection 2025.

Probing latent brain dynamics in Alzheimer's disease via recurrent neural network.

Cogn Neurodyn. 2024 Jun;18(3):1183-1195. doi: 10.1007/s11571-023-09981-9. Epub 2023 Jun 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

强调未见过的单词：用于端到端语音识别的新词汇习得

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献