二进制掩模模式在背景噪声中的自动语音识别中的作用。

The role of binary mask patterns in automatic speech recognition in background noise.

机构信息

Department of Computer Science and Engineering and Center for Cognitive Science, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2013 May;133(5):3083-93. doi: 10.1121/1.4798661.

DOI:10.1121/1.4798661

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4109294/

Abstract

Processing noisy signals using the ideal binary mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of binary mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, binary masking significantly improves ASR performance even when the SNR is as low as -60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

摘要

使用理想二进制掩蔽处理噪声信号可以提高自动语音识别 (ASR) 的性能。本文首次研究了在不同噪声、信噪比 (SNR) 和词汇量下，二进制掩蔽模式在 ASR 中的作用。二进制掩蔽可以通过将混合信号的时频单元内的 SNR 与局部准则 (LC) 进行比较，或者通过将局部目标能量与语音的长期平均谱能量进行比较来计算。ASR 结果表明：（1）与人类语音识别类似，即使 SNR 低至-60dB，二进制掩蔽也能显著提高 ASR 性能；（2）ASR 性能曲线与人类可懂度实验获得的结果定性相似；（3）性能峰值处的 LC 与 SNR 的差异与识别准确性的相关性高于 LC；（4）性能峰值处的 LC 低于 0dB，这是处理后信号的 SNR 增益最大化的阈值。这与人类表现的广泛一致性令人惊讶。结果还表明，最大化 SNR 增益可能不是提高人类或机器对噪声语音识别的合适目标。

相似文献

1

The role of binary mask patterns in automatic speech recognition in background noise.二进制掩模模式在背景噪声中的自动语音识别中的作用。

J Acoust Soc Am. 2013 May;133(5):3083-93. doi: 10.1121/1.4798661.

2

Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold.混响环境下理想二值掩蔽对言语可懂度的影响：早期反射声和信噪比阈的作用。

J Acoust Soc Am. 2013 Mar;133(3):1707-17. doi: 10.1121/1.4789895.

3

Role of mask pattern in intelligibility of ideal binary-masked noisy speech.掩码模式在理想二元掩码噪声语音可懂度中的作用。

J Acoust Soc Am. 2009 Sep;126(3):1415-26. doi: 10.1121/1.3179673.

4

Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios.低混合信噪比下与白噪声混合语音的理想基于掩码语音增强算法比较

J Acoust Soc Am. 2022 Dec;152(6):3458. doi: 10.1121/10.0016494.

5

Recognition of speech in noise after application of time-frequency masks: dependence on frequency and threshold parameters.应用时频掩蔽后噪声中的语音识别：频率和阈值参数的依赖性。

J Acoust Soc Am. 2013 Apr;133(4):2390-6. doi: 10.1121/1.4792143.

6

Perceptual effects of noise reduction by time-frequency masking of noisy speech.噪声语音的时频掩蔽降噪的感知效果。

J Acoust Soc Am. 2012 Oct;132(4):2690-9. doi: 10.1121/1.4747006.

7

The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio.基于信噪比的语音分离最优时频掩蔽比。

J Acoust Soc Am. 2013 Nov;134(5):EL452-8. doi: 10.1121/1.4824632.

8

Intelligibility of reverberant noisy speech with ideal binary masking.用理想二值掩蔽评估混响噪声语音的可懂度。

J Acoust Soc Am. 2011 Oct;130(4):2153-61. doi: 10.1121/1.3631668.

9

Effect of the division between early and late reflections on intelligibility of ideal binary-masked speech.早期反射与晚期反射之间的划分对理想二元掩蔽语音可懂度的影响。

J Acoust Soc Am. 2015 May;137(5):2801-10. doi: 10.1121/1.4919287.

10

Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.基于调制频率选择性处理后的信噪比包络功率比预测语音可懂度。

J Acoust Soc Am. 2011 Sep;130(3):1475-87. doi: 10.1121/1.3621502.

引用本文的文献

1

Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性

IEEE/ACM Trans Audio Speech Lang Process. 2015 Jan;23(1):92-101. doi: 10.1109/TASLP.2014.2372314. Epub 2015 Jan 14.

2

On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标

IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

本文引用的文献

1

Intelligibility of reverberant noisy speech with ideal binary masking.用理想二值掩蔽评估混响噪声语音的可懂度。

J Acoust Soc Am. 2011 Oct;130(4):2153-61. doi: 10.1121/1.3631668.

2

Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.添加背景噪声可提高理想二值掩蔽噪声语音的可懂度。

J Acoust Soc Am. 2011 Apr;129(4):2227-36. doi: 10.1121/1.3559707.

3

Robust speech recognition from binary masks.

J Acoust Soc Am. 2010 Nov;128(5):EL217-22. doi: 10.1121/1.3497358.

4

Role of mask pattern in intelligibility of ideal binary-masked noisy speech.掩码模式在理想二元掩码噪声语音可懂度中的作用。

J Acoust Soc Am. 2009 Sep;126(3):1415-26. doi: 10.1121/1.3179673.

5

Speech intelligibility in background noise with ideal binary time-frequency masking.基于理想二元时频掩蔽的背景噪声下语音清晰度

J Acoust Soc Am. 2009 Apr;125(4):2336-47. doi: 10.1121/1.3083233.

6

Speech perception of noise with binary gains.具有二元增益的噪声语音感知

J Acoust Soc Am. 2008 Oct;124(4):2303-7. doi: 10.1121/1.2967865.

7

A model for multitalker speech perception.一种多说话者语音感知模型。

J Acoust Soc Am. 2008 Nov;124(5):3213-24. doi: 10.1121/1.2982413.

8

Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.影响理想二元掩蔽语音可懂度的因素：对降噪的启示

J Acoust Soc Am. 2008 Mar;123(3):1673-82. doi: 10.1121/1.2832617.

9

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.利用理想的时频分离来分离语音对语音掩蔽中的能量成分。

J Acoust Soc Am. 2006 Dec;120(6):4007-18. doi: 10.1121/1.2363929.

10

Determination of the potential benefit of time-frequency gain manipulation.时频增益操纵潜在益处的测定

Ear Hear. 2006 Oct;27(5):480-92. doi: 10.1097/01.aud.0000233891.86809.df.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验