Suppr超能文献

二进制掩模模式在背景噪声中的自动语音识别中的作用。

The role of binary mask patterns in automatic speech recognition in background noise.

机构信息

Department of Computer Science and Engineering and Center for Cognitive Science, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2013 May;133(5):3083-93. doi: 10.1121/1.4798661.

Abstract

Processing noisy signals using the ideal binary mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of binary mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, binary masking significantly improves ASR performance even when the SNR is as low as -60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

摘要

使用理想二进制掩蔽处理噪声信号可以提高自动语音识别 (ASR) 的性能。本文首次研究了在不同噪声、信噪比 (SNR) 和词汇量下,二进制掩蔽模式在 ASR 中的作用。二进制掩蔽可以通过将混合信号的时频单元内的 SNR 与局部准则 (LC) 进行比较,或者通过将局部目标能量与语音的长期平均谱能量进行比较来计算。ASR 结果表明:(1)与人类语音识别类似,即使 SNR 低至-60dB,二进制掩蔽也能显著提高 ASR 性能;(2)ASR 性能曲线与人类可懂度实验获得的结果定性相似;(3)性能峰值处的 LC 与 SNR 的差异与识别准确性的相关性高于 LC;(4)性能峰值处的 LC 低于 0dB,这是处理后信号的 SNR 增益最大化的阈值。这与人类表现的广泛一致性令人惊讶。结果还表明,最大化 SNR 增益可能不是提高人类或机器对噪声语音识别的合适目标。

相似文献

引用本文的文献

2
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

本文引用的文献

3
Robust speech recognition from binary masks.
J Acoust Soc Am. 2010 Nov;128(5):EL217-22. doi: 10.1121/1.3497358.
6
Speech perception of noise with binary gains.具有二元增益的噪声语音感知
J Acoust Soc Am. 2008 Oct;124(4):2303-7. doi: 10.1121/1.2967865.
7
A model for multitalker speech perception.一种多说话者语音感知模型。
J Acoust Soc Am. 2008 Nov;124(5):3213-24. doi: 10.1121/1.2982413.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验