Medical Research Council Cognition and Brain Sciences Unit, Cambridge, United Kingdom.
Ear Hear. 2013 Jul-Aug;34(4):426-36. doi: 10.1097/AUD.0b013e31827535f8.
Several studies have shown that the ability to identify the timbre of musical instruments is reduced in cochlear implant (CI) users compared with normal-hearing (NH) listeners. However, most of these studies have focused on tasks that require specific musical knowledge. In contrast, the present study investigates the perception of timbre by CI subjects using a multidimensional scaling (MDS) paradigm. The main objective was to investigate whether CI subjects use the same cues as NH listeners do to differentiate the timbre of musical instruments.
Three groups of 10 NH subjects and one group of 10 CI subjects were asked to make dissimilarity judgments between pairs of instrumental sounds. The stimuli were 16 synthetic instrument tones spanning a wide range of instrument families. All sounds had the same fundamental frequency (261 Hz) and were balanced in loudness and in perceived duration before the experiment. One group of NH subjects listened to unprocessed stimuli. The other two groups of NH subjects listened to the same stimuli passed through a four-channel or an eight-channel noise vocoder, designed to simulate the signal processing performed by a real CI. Subjects were presented with all possible combinations of pairs of instruments and had to estimate, for each pair, the amount of dissimilarity between the two sounds. These estimates were used to construct dissimilarity matrices, which were further analyzed using an MDS model. The model output gave, for each subject group, an optimal graphical representation of the perceptual distances between stimuli (the so-called "timbre space").
For all groups, the first two dimensions of the timbre space were strikingly similar and correlated strongly with the logarithm of the attack time and with the center of gravity of the spectral envelope, respectively. The acoustic correlate of the third dimension differed across groups but only accounted for a small proportion of the variance explained by the MDS solution. Surprisingly, CI subjects and NH subjects listening to noise-vocoded simulations gave relatively more weight to the spectral envelope dimension and less weight to the attack-time dimension when making their judgments than NH subjects listening to unprocessed stimuli. One possible reason for the relatively higher salience of spectral envelope cues in real and simulated CIs may be that the degradation of local fine spectral details produced a more stable spectral envelope across the stimulus duration.
The internal representation of musical timbre for isolated musical instrument sounds was found to be similar in NH and in CI listeners. This suggests that training procedures designed to improve timbre recognition in CIs will indeed train CI subjects to use the same cues as NH listeners. Furthermore, NH subjects listening to noise-vocoded sounds appear to be a good model of CI timbre perception as they show the same first two perceptual dimensions as CI subjects do and also exhibit a similar change in perceptual weights applied to these two dimensions. This last finding validates the use of simulations to evaluate and compare training procedures to improve timbre perception in CIs.
多项研究表明,与正常听力(NH)听众相比,人工耳蜗(CI)使用者识别乐器音色的能力有所下降。然而,大多数这些研究都集中在需要特定音乐知识的任务上。相比之下,本研究使用多维标度(MDS)范式调查 CI 受试者对音色的感知。主要目的是研究 CI 受试者是否使用与 NH 听众相同的线索来区分乐器的音色。
三组 10 名 NH 受试者和一组 10 名 CI 受试者被要求在乐器声音对之间进行不相似性判断。刺激物是 16 个跨越广泛乐器家族的合成乐器音,所有声音的基频(261Hz)相同,在实验前在响度和感知时长方面平衡。一组 NH 受试者听未处理的刺激。另外两组 NH 受试者听通过四通道或八通道噪声声码器播放的相同刺激,该声码器旨在模拟真实 CI 执行的信号处理。受试者听到所有可能的乐器对组合,并必须为每对声音估计两个声音之间的不相似程度。这些估计用于构建不相似性矩阵,然后使用 MDS 模型进一步分析这些矩阵。该模型输出为每个受试者组提供了刺激之间感知距离的最佳图形表示(所谓的“音色空间”)。
对于所有组,音色空间的前两个维度非常相似,并且与攻击时间的对数和频谱包络的重心分别强烈相关。第三维度的声学相关因素在组间有所不同,但仅占 MDS 解决方案解释方差的一小部分。令人惊讶的是,与处理过的刺激相比,CI 受试者和 NH 受试者在噪声声码器模拟中对音色的判断更重视频谱包络维度,而对攻击时间维度的重视程度较低。在真实和模拟 CI 中,光谱包络线索相对更显著的一个可能原因是,局部精细光谱细节的退化导致刺激持续时间内的光谱包络更稳定。
对于孤立乐器声音的音乐音色的内部表示,NH 和 CI 听众之间被发现是相似的。这表明,旨在提高 CI 中音色识别的训练程序确实将训练 CI 受试者使用与 NH 听众相同的线索。此外,听噪声声码化声音的 NH 受试者似乎是 CI 音色感知的良好模型,因为他们表现出与 CI 受试者相同的前两个感知维度,并且还表现出对这两个维度应用的感知权重的相似变化。这最后一个发现验证了使用模拟来评估和比较训练程序以改善 CI 中的音色感知。