在强噪声环境中，语音识别中频谱线索的相对重要性。

The relative importance of spectral cues for vowel recognition in severe noise.

机构信息

Department of Electrical, Electronic and Computer Engineering, University of Pretoria, University Road, Pretoria, 0002, South Africa.

出版信息

J Acoust Soc Am. 2012 Oct;132(4):2652-62. doi: 10.1121/1.4751543.

DOI:10.1121/1.4751543

PMID:23039458

Abstract

The importance of formants and spectral shape was investigated for vowel perception in severe noise. Twelve vowels were synthesized using two different synthesis methods, one where the original spectral detail was preserved, and one where the vowel was represented by the spectral peaks of the first three formants. In addition, formants F1 and F2 were suppressed individually to investigate the importance of each in severe noise. Vowels were presented to listeners in quiet and in speech-shaped noise at signal to noise ratios (SNRs) of 0, -5, and -10 dB, and vowel confusions were determined in a number of conditions. Results suggest that the auditory system relies on formant information for vowel perception irrespective of the SNR, but that, as noise increases, it relies increasingly on more complete spectral information to perform formant extraction. A second finding was that, while F2 is more important in quiet or low noise conditions, F1 and F2 are of similar importance in severe noise.

摘要

在强噪声环境中，共振峰和频谱形状对元音感知的重要性进行了研究。使用两种不同的合成方法合成了 12 个元音，一种方法保留了原始频谱细节，另一种方法用前三个共振峰的频谱峰值来表示元音。此外，还分别抑制了共振峰 F1 和 F2，以研究它们在强噪声中的重要性。在安静环境和语音噪声中，以 0、-5 和-10dB 的信噪比向听众呈现元音，并在多种条件下确定元音混淆。结果表明，听觉系统依赖于共振峰信息进行元音感知，而与信噪比无关，但随着噪声的增加，它越来越依赖于更完整的频谱信息来进行共振峰提取。第二个发现是，虽然 F2 在安静或低噪声条件下更为重要，但在强噪声中，F1 和 F2 具有相似的重要性。