Suppr超能文献

二进制掩模模式在背景噪声中的自动语音识别中的作用。

The role of binary mask patterns in automatic speech recognition in background noise.

机构信息

Department of Computer Science and Engineering and Center for Cognitive Science, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2013 May;133(5):3083-93. doi: 10.1121/1.4798661.

Abstract

Processing noisy signals using the ideal binary mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of binary mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, binary masking significantly improves ASR performance even when the SNR is as low as -60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

摘要

使用理想二进制掩蔽处理噪声信号可以提高自动语音识别 (ASR) 的性能。本文首次研究了在不同噪声、信噪比 (SNR) 和词汇量下,二进制掩蔽模式在 ASR 中的作用。二进制掩蔽可以通过将混合信号的时频单元内的 SNR 与局部准则 (LC) 进行比较,或者通过将局部目标能量与语音的长期平均谱能量进行比较来计算。ASR 结果表明:(1)与人类语音识别类似,即使 SNR 低至-60dB,二进制掩蔽也能显著提高 ASR 性能;(2)ASR 性能曲线与人类可懂度实验获得的结果定性相似;(3)性能峰值处的 LC 与 SNR 的差异与识别准确性的相关性高于 LC;(4)性能峰值处的 LC 低于 0dB,这是处理后信号的 SNR 增益最大化的阈值。这与人类表现的广泛一致性令人惊讶。结果还表明,最大化 SNR 增益可能不是提高人类或机器对噪声语音识别的合适目标。

相似文献

1
The role of binary mask patterns in automatic speech recognition in background noise.
J Acoust Soc Am. 2013 May;133(5):3083-93. doi: 10.1121/1.4798661.
3
Role of mask pattern in intelligibility of ideal binary-masked noisy speech.
J Acoust Soc Am. 2009 Sep;126(3):1415-26. doi: 10.1121/1.3179673.
6
Perceptual effects of noise reduction by time-frequency masking of noisy speech.
J Acoust Soc Am. 2012 Oct;132(4):2690-9. doi: 10.1121/1.4747006.
7
8
Intelligibility of reverberant noisy speech with ideal binary masking.
J Acoust Soc Am. 2011 Oct;130(4):2153-61. doi: 10.1121/1.3631668.

引用本文的文献

1
Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.
IEEE/ACM Trans Audio Speech Lang Process. 2015 Jan;23(1):92-101. doi: 10.1109/TASLP.2014.2372314. Epub 2015 Jan 14.
2
On Training Targets for Supervised Speech Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

本文引用的文献

1
Intelligibility of reverberant noisy speech with ideal binary masking.
J Acoust Soc Am. 2011 Oct;130(4):2153-61. doi: 10.1121/1.3631668.
2
Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.
J Acoust Soc Am. 2011 Apr;129(4):2227-36. doi: 10.1121/1.3559707.
3
Robust speech recognition from binary masks.
J Acoust Soc Am. 2010 Nov;128(5):EL217-22. doi: 10.1121/1.3497358.
4
Role of mask pattern in intelligibility of ideal binary-masked noisy speech.
J Acoust Soc Am. 2009 Sep;126(3):1415-26. doi: 10.1121/1.3179673.
5
Speech intelligibility in background noise with ideal binary time-frequency masking.
J Acoust Soc Am. 2009 Apr;125(4):2336-47. doi: 10.1121/1.3083233.
6
Speech perception of noise with binary gains.
J Acoust Soc Am. 2008 Oct;124(4):2303-7. doi: 10.1121/1.2967865.
7
A model for multitalker speech perception.
J Acoust Soc Am. 2008 Nov;124(5):3213-24. doi: 10.1121/1.2982413.
10
Determination of the potential benefit of time-frequency gain manipulation.
Ear Hear. 2006 Oct;27(5):480-92. doi: 10.1097/01.aud.0000233891.86809.df.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验