IEEE Trans Neural Netw Learn Syst. 2017 Aug;28(8):1959-1965. doi: 10.1109/TNNLS.2016.2550532. Epub 2016 Apr 14.
Model adaptation is a key technique that enables a modern automatic speech recognition (ASR) system to adjust its parameters, using a small amount of enrolment data, to the nuances in the speech spectrum due to microphone mismatch in the training and test data. In this brief, we investigate four different adaptation schemes for connectionist (also known as hybrid) ASR systems that learn microphone-specific hidden unit contributions, given some adaptation material. This solution is made possible adopting one of the following schemes: 1) the use of Hermite activation functions; 2) the introduction of bias and slope parameters in the sigmoid activation functions; 3) the injection of an amplitude parameter specific for each sigmoid unit; or 4) the combination of 2) and 3). Such a simple yet effective solution allows the adapted model to be stored in a small-sized storage space, a highly desirable property of adaptation algorithms for deep neural networks that are suitable for large-scale online deployment. Experimental results indicate that the investigated approaches reduce word error rates on the standard Spoke 6 task of the Wall Street Journal corpus compared with unadapted ASR systems. Moreover, the proposed adaptation schemes all perform better than simple multicondition training and comparable favorably against conventional linear regression-based approaches while using up to 15 orders of magnitude fewer parameters. The proposed adaptation strategies are also effective when a single adaptation sentence is available.
模型自适应是一种关键技术,它使现代自动语音识别(ASR)系统能够使用少量注册数据,根据训练和测试数据中麦克风不匹配导致的语音频谱细微差别,调整其参数。在本简讯中,我们研究了四种不同的连接主义(也称为混合)ASR 系统的自适应方案,这些方案针对特定于麦克风的隐藏单元贡献进行学习,给定一些自适应材料。通过采用以下方案之一,可以实现这种解决方案:1)使用 Hermite 激活函数;2)在 sigmoid 激活函数中引入偏置和斜率参数;3)为每个 sigmoid 单元注入特定的幅度参数;或 4)结合 2)和 3)。这种简单而有效的解决方案允许将自适应模型存储在小尺寸的存储空间中,这是适合大规模在线部署的深度神经网络自适应算法的一个非常理想的特性。实验结果表明,与未经自适应的 ASR 系统相比,所研究的方法在标准华尔街日报语料库的 Spoke 6 任务上降低了单词错误率。此外,所提出的自适应方案在使用多达 15 个数量级更少的参数的情况下,均优于简单的多条件训练,并可与传统的基于线性回归的方法相媲美。当只有一个自适应句子可用时,所提出的自适应策略也是有效的。