Department of Electrical Engineering, The University of Texas at Dallas, Richardson, Texas, USA.
Ear Hear. 2010 Apr;31(2):259-67. doi: 10.1097/AUD.0b013e3181c7db17.
The purpose of this study is to assess the contribution of information provided by obstruent consonants (e.g., stops and fricatives) to speech intelligibility in simulated acoustic-electric hearing. As a secondary objective, this study examines the performance of an objective measure that can potentially be used for predicting the intelligibility of vocoded speech.
Noise-corrupted sentences are used in experiment 1 in which the noise-corrupted obstruent consonants are replaced with clean obstruent consonants, while leaving the sonorant sounds (vowels, semivowels, and nasals) corrupted. In one condition, listeners have only access to the low-frequency (<600 Hz) acoustic portion of the clean consonant spectra, in other condition, listeners have only access to the higher frequency (>600 Hz) portion (vocoded) of the clean consonant spectra, and in the third condition, they have access to both. In experiment 2, we investigate a speech-coding strategy that selectively attenuates the low-frequency portion of the consonant spectra while leaving the vocoded portion corrupted by noise. Finally, using the data collected from experiments 1 and 2, we evaluate the performance of an objective measure in terms of predicting intelligibility of vocoded speech. This measure was originally designed to predict speech quality and has never been evaluated with vocoded speech.
Significant improvements (about 30 percentage points) in intelligibility were noted in experiment 1 in steady and two-talker masker conditions when the listeners had access to the clean obstruent consonants in both the acoustic and the vocoded portions of the spectrum. The improvement was more evident in the low signal to noise ratio levels (-5 and 0 dB). Further analysis indicated that it was access to the vocoded portion of the consonant spectra, rather than access to the low-frequency acoustic portion of the consonant spectra that contributed the most to the large improvements in performance. In experiment 2, a small (14 percentage points) but statistically significant improvement in performance was obtained at 0 dB signal to noise ratio (steady masker) when the obstruent consonants were selectively attenuated in the low-frequency acoustic portion alone (the vocoded portion was left noise corrupted). The examined objective measure predicted with a relatively high correlation (r = 0.92 to 0.94) [corrected] the intelligibility of vocoded speech improved in both steady and two-talker masking conditions.
Providing access to the clean obstruent spectra can yield substantial improvements in intelligibility relative to the simulated acoustic-electric condition. Much of this improvement can be attributed to the listeners having access to the clean vocoded portion of the obstruent consonants. The large contribution of obstruent consonants in speech recognition in simulated acoustic-electric hearing stems from the fact that these consonants provide reliable acoustic landmarks which in turn enable listener to integrate effectively pieces of the message glimpsed over temporal gaps into one coherent speech stream. It is argued that these landmarks are smeared in existing cochlear implant systems, including the bimodal systems, owing to envelope compression, and the fact that the obstruent consonants are probably the first to be masked by background noise. Overall, the outcomes from this study suggest that the obstruent consonants need to be treated differently for improved speech recognition in noise.
本研究旨在评估阻塞音辅音(如闭塞音和摩擦音)提供的信息对模拟电声听力中言语可懂度的贡献。作为次要目标,本研究还检验了一种潜在的可用于预测语音编码言语可懂度的客观测量方法的性能。
在实验 1 中,使用噪声污染的句子,其中噪声污染的阻塞音辅音被替换为干净的阻塞音辅音,而使浊音(元音、半元音和鼻音)保持污染状态。在一种条件下,听者仅能访问干净辅音频谱的低频(<600Hz)部分,在另一种条件下,听者仅能访问干净辅音频谱的高频(>600Hz)部分(语音编码),在第三种条件下,他们可以访问两者。在实验 2 中,我们研究了一种语音编码策略,该策略选择性地衰减辅音频谱的低频部分,而使语音编码部分受噪声污染。最后,使用从实验 1 和实验 2 中收集的数据,我们评估了一种客观测量方法在预测语音编码言语可懂度方面的性能。该方法最初用于预测语音质量,从未与语音编码言语一起进行评估。
在稳定和双说话人掩蔽条件下,实验 1 中,当听者能够访问频谱中的声学和语音编码部分的干净阻塞音辅音时,可懂度显著提高(约 30 个百分点)。在低信噪比水平(-5 和 0dB)下,改善更为明显。进一步分析表明,是对辅音频谱的语音编码部分的访问,而不是对辅音频谱的低频声学部分的访问,对性能的显著提高贡献最大。在实验 2 中,当阻塞音辅音仅在低频声学部分被选择性衰减(语音编码部分保持噪声污染)时,在 0dB 信噪比(稳定掩蔽)下,性能获得了小但具有统计学意义的提高(14 个百分点)。所检查的客观测量方法与较高的相关性(r=0.92 到 0.94)[已纠正]相吻合,预测语音编码言语的可懂度在稳定和双说话人掩蔽条件下都有所提高。
与模拟电声条件相比,提供对干净阻塞音频谱的访问可以显著提高可懂度。这种改善的大部分可以归因于听者能够访问干净的语音编码阻塞音辅音部分。在模拟电声听力中,阻塞音辅音在言语识别中做出了很大的贡献,这是因为它们提供了可靠的声学标记,这反过来又使听者能够有效地将听到的消息片段整合到一个连贯的语音流中。有人认为,这些标记在现有的耳蜗植入系统中(包括双模系统)由于包络压缩而变得模糊,并且阻塞音辅音可能是首先被背景噪声掩蔽的。总的来说,这项研究的结果表明,为了提高噪声中的言语识别能力,需要对阻塞音辅音进行不同的处理。