Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA.
Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota, USA.
Ear Hear. 2021 Sep/Oct;42(5):1412-1427. doi: 10.1097/AUD.0000000000001043.
Cochlear implant (CI) recipients are at a severe disadvantage compared with normal-hearing listeners in distinguishing consonants that differ by place of articulation because the key relevant spectral differences are degraded by the implant. One component of that degradation is the upward shifting of spectral energy that occurs with a shallow insertion depth of a CI. The present study aimed to systematically measure the effects of spectral shifting on word recognition and phoneme categorization by specifically controlling the amount of shifting and using stimuli whose identification specifically depends on perceiving frequency cues. We hypothesized that listeners would be biased toward perceiving phonemes that contain higher-frequency components because of the upward frequency shift and that intelligibility would decrease as spectral shifting increased.
Normal-hearing listeners (n = 15) heard sine wave-vocoded speech with simulated upward frequency shifts of 0, 2, 4, and 6 mm of cochlear space to simulate shallow CI insertion depth. Stimuli included monosyllabic words and /b/-/d/ and /∫/-/s/ continua that varied systematically by formant frequency transitions or frication noise spectral peaks, respectively. Recalibration to spectral shifting was operationally defined as shifting perceptual acoustic-phonetic mapping commensurate with the spectral shift. In other words, adjusting frequency expectations for both phonemes upward so that there is still a perceptual distinction, rather than hearing all upward-shifted phonemes as the higher-frequency member of the pair.
For moderate amounts of spectral shifting, group data suggested a general "halfway" recalibration to spectral shifting, but individual data suggested a notably different conclusion: half of the listeners were able to recalibrate fully, while the other halves of the listeners were utterly unable to categorize shifted speech with any reliability. There were no participants who demonstrated a pattern intermediate to these two extremes. Intelligibility of words decreased with greater amounts of spectral shifting, also showing loose clusters of better- and poorer-performing listeners. Phonetic analysis of word errors revealed certain cues were more susceptible to being compromised due to a frequency shift (place and manner of articulation), while voicing was robust to spectral shifting.
Shifting the frequency spectrum of speech has systematic effects that are in line with known properties of speech acoustics, but the ensuing difficulties cannot be predicted based on tonotopic mismatch alone. Difficulties are subject to substantial individual differences in the capacity to adjust acoustic-phonetic mapping. These results help to explain why speech recognition in CI listeners cannot be fully predicted by peripheral factors like electrode placement and spectral resolution; even among listeners with functionally equivalent auditory input, there is an additional factor of simply being able or unable to flexibly adjust acoustic-phonetic mapping. This individual variability could motivate precise treatment approaches guided by an individual's relative reliance on wideband frequency representation (even if it is mismatched) or limited frequency coverage whose tonotopy is preserved.
与正常听力者相比,人工耳蜗植入者在区分发音部位不同的辅音时处于严重劣势,因为关键的相关频谱差异会因植入物而退化。这种退化的一个组成部分是,当人工耳蜗植入深度较浅时,频谱能量会向上转移。本研究旨在通过专门控制频谱移动的幅度,并使用其识别完全依赖于感知频率线索的刺激,系统地测量频谱移动对单词识别和音位分类的影响。我们假设由于频率向上移动,听众会偏向于感知包含较高频率成分的音位,并且随着频谱移动的增加,可懂度会降低。
正常听力者(n = 15)听到模拟向上频率偏移 0、2、4 和 6 毫米耳蜗空间的正弦波声码化语音,以模拟人工耳蜗的浅插入深度。刺激包括单音节词和/b/-/d/和/∫/-/s/连续体,分别通过共振峰频率转换或摩擦噪声频谱峰值系统地变化。对频谱移动的重新校准被操作定义为与频谱移动相称的感知声学分型的重新校准。换句话说,向上调整两个音位的频率预期,以便仍然存在感知上的区别,而不是将所有向上移动的音位都听为该对中较高频率的音位。
对于适度的频谱移动量,组数据表明存在普遍的“中途”重新校准到频谱移动,但个体数据表明了截然不同的结论:一半的听众能够完全重新校准,而另一半听众则完全无法可靠地对移动的语音进行分类。没有参与者表现出介于这两个极端之间的模式。随着频谱移动量的增加,单词的可懂度下降,也显示出较好和较差的听众集群。对单词错误的语音分析表明,某些线索由于频率移动(发音方式和发音部位)而更容易受到影响,而浊音则对频谱移动具有很强的鲁棒性。
语音频谱的移动具有与语音声学已知特性一致的系统影响,但由此产生的困难不能仅基于音调失配来预测。困难受到调整声学语音映射能力的显著个体差异的影响。这些结果有助于解释为什么人工耳蜗植入者的语音识别不能完全由电极放置和频谱分辨率等外围因素来预测;即使在功能等效的听觉输入的听众中,还有一个额外的因素,即仅仅能够或不能灵活地调整声学语音映射。这种个体可变性可能会促使基于个体对宽带频率表示(即使不匹配)或其音调图得到保留的有限频率覆盖范围的相对依赖性的精确治疗方法。