Xu Li, Pfingst Bryan E
School of Hearing, Speech and Language Sciences, Ohio University, Athens, OH 45701, USA.
Hear Res. 2008 Aug;242(1-2):132-40. doi: 10.1016/j.heares.2007.12.010. Epub 2007 Dec 28.
Features of stimulation important for speech recognition in people with normal hearing and in people using implanted auditory prostheses include spectral information represented by place of stimulation along the tonotopic axis and temporal information represented in low-frequency envelopes of the signal. The relative contributions of these features to speech recognition and their interactions have been studied using vocoder-like simulations of cochlear implant speech processors presented to listeners with normal hearing. In these studies, spectral/place information was manipulated by varying the number of channels and the temporal-envelope information was manipulated by varying the lowpass cutoffs of the envelope extractors. Consonant and vowel recognition in quiet reached plateau at 8 and 12 channels and lowpass cutoff frequencies of 16 Hz and 4 Hz, respectively. Phoneme (especially vowel) recognition in noise required larger numbers of channels. Lexical tone recognition required larger numbers of channels and higher lowpass cutoff frequencies. There was a tradeoff between spectral/place and temporal-envelope requirements. Most current auditory prostheses seem to deliver adequate temporal-envelope information, but the number of effective channels is suboptimal, particularly for speech recognition in noise, lexical tone recognition, and music perception.
对于听力正常的人和使用植入式听觉假体的人而言,对语音识别很重要的刺激特征包括沿音频轴的刺激位置所代表的频谱信息以及信号低频包络中所代表的时间信息。利用向听力正常的听众呈现的类似声码器的人工耳蜗语音处理器模拟,已经研究了这些特征对语音识别的相对贡献及其相互作用。在这些研究中,通过改变通道数量来操纵频谱/位置信息,通过改变包络提取器的低通截止频率来操纵时间包络信息。在安静环境中,辅音和元音识别分别在8个和12个通道以及16赫兹和4赫兹的低通截止频率时达到平稳状态。在噪声环境中的音素(尤其是元音)识别需要更多的通道。声调识别需要更多的通道和更高频的低通截止频率。在频谱/位置和时间包络要求之间存在权衡。目前大多数听觉假体似乎能够提供足够的时间包络信息,但有效通道数量并不理想,特别是对于噪声环境中的语音识别、声调识别和音乐感知。