基于优化增益控制策略的普通话自动语音识别性能提升

Automatic speech recognition (ASR) is an essential technique of human-computer interactions; gain control is a commonly used operation in ASR. However, inappropriate gain control strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of sufficient theoretical analyses and proof of the relationship between gain control and WER, various unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named maximized original signal transmission (MOST) is proposed in this study to minimize the adverse impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative relationship between the gain control strategy and the ASR performance was established using the noise figure index. Second, through an analysis of the quantitative relationship, an optimal MOST gain control strategy with minimal performance degradation was theoretically deduced. Finally, comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain control strategy can significantly reduce the WER of the experimental ASR system, with a 10% mean absolute WER reduction at -9 dB gain.

自动语音识别（ASR）是人机交互的一项关键技术；增益控制是 ASR 中常用的操作。然而，不适当的增益控制策略可能会导致 ASR 的单词错误率（WER）增加。由于目前缺乏关于增益控制和 WER 之间关系的充分理论分析和证明，各种不受约束的增益控制策略已被应用于现实的 ASR 系统中，很少能实现针对最低 WER 的最佳增益控制。本研究提出了一种名为最大原始信号传输（MOST）的增益控制策略，以最小化增益控制对 ASR 系统的不利影响。首先，通过对增益控制策略进行建模，使用噪声系数指标建立了增益控制策略与 ASR 性能之间的定量关系。其次，通过对定量关系的分析，从理论上推导出了具有最小性能降级的最优 MOST 增益控制策略。最后，在普通话数据集上进行了全面的对比实验，结果表明，所提出的 MOST 增益控制策略可以显著降低实验 ASR 系统的 WER，在-9dB 增益下平均绝对 WER 降低了 10%。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具