Shafiro Valeriy
City University of New York Graduate School and University Center. Ph. D. Program in Speech and Hearing Sciences, New York, New York 10016, USA.
Ear Hear. 2008 Jun;29(3):401-20. doi: 10.1097/AUD.0b013e31816a0cf1.
This study investigated the identification of familiar environmental sounds with varying spectral resolution to establish (1) the number of frequency channels needed to perceive a large heterogeneous set of familiar environmental sounds, (2) the role of cross-channel asynchrony in identification performance, and (3) the acoustic correlates of the spectral resolution required for identification.
In experiment 1, 60 normal-hearing listeners identified environmental sounds in a 60-alternative closed--set response task as a function of six spectral resolution conditions (i.e., 2, 4, 8, 16, 24, and 32 frequency channels) obtained with an envelope-vocoder. In experiment 2, identification accuracy for varying amounts of cross-channel asynchrony was determined for sounds with preserved and degraded fine spectral structure in 10 normal-hearing listeners. Experiment 3 examined identification performance of 72 listeners across six spectral resolution conditions as in experiment 1, but using three different signal processing methods designed to minimize cross-channel asynchrony across channels. Follow-up acoustic and discriminant analyses were carried out to identify parameters that can distinguish environmental sounds based on required spectral resolution.
Identification accuracy tended to improve with increasing spectral resolution reaching the maximum of 76%. However, in experiment 1, performance did not change significantly beyond eight channels, whereas identification accuracy of some sounds declined with increasing spectral resolution. In experiment 2, increases in cross-channel asynchrony for sounds with preserved fine spectra had a small, but significant negative effect on identification. However, minimizing the amount of asynchrony had no significant effect on the overall identification of spectrally degraded sounds in experiment 3. Acoustic analysis indicated several spectral and temporal measures that differed significantly between sounds that required eight or fewer channels and those that required 16 or more channels for 70% correct identification. Discriminant analysis revealed that the sounds could be classified into high- and low-required spectral resolution groups with 83% accuracy based on only two acoustic parameters: the number of bursts in the envelope and the standard deviation of spectral centroid velocity.
Increasing spectral resolution generally had a positive effect on identification of familiar environmental sounds. However, across conditions performance accuracy remained well-below that of control stimuli with preserved fine spectra, despite becoming asymptotic above eight channels. Cross-channel asynchrony introduced during vocoder processing, although detrimental for some sounds, was not a major factor that prevented further improvement in overall accuracy. A spectral resolution greater than 32 channels, along with additional fine spectral and temporal information may be required for identification of a number of environmental sounds. This study provides a preliminary basis for optimizing environmental sound perception by cochlear implant users by highlighting the role of several acoustic factors important for environmental sound identification.
本研究调查了不同频谱分辨率下对常见环境声音的识别,以确定(1)感知大量不同类型常见环境声音所需的频率通道数量;(2)跨通道异步在识别性能中的作用;(3)识别所需频谱分辨率的声学相关因素。
在实验1中,60名听力正常的受试者在一个60选1的封闭式反应任务中识别环境声音,该任务是六种频谱分辨率条件(即2、4、8、16、24和32个频率通道)的函数,这些条件通过包络声码器获得。在实验2中,确定了10名听力正常的受试者对具有保留和退化精细频谱结构的声音的不同程度跨通道异步的识别准确率。实验3研究了72名受试者在与实验1相同的六种频谱分辨率条件下的识别性能,但使用了三种不同的信号处理方法,旨在最小化通道间的跨通道异步。进行了后续的声学和判别分析,以确定能够基于所需频谱分辨率区分环境声音的参数。
识别准确率倾向于随着频谱分辨率的提高而提高,最高达到76%。然而,在实验1中,超过8个通道后性能没有显著变化,而一些声音的识别准确率随着频谱分辨率的提高而下降。在实验2中,具有保留精细频谱的声音的跨通道异步增加对识别有小但显著的负面影响。然而,在实验3中,最小化异步量对频谱退化声音的总体识别没有显著影响。声学分析表明,对于70%正确识别所需通道数为8个或更少的声音与所需通道数为16个或更多的声音,有几个频谱和时间测量值存在显著差异。判别分析表明,仅基于两个声学参数:包络中的突发次数和频谱质心速度的标准差,声音可以被分为高和低所需频谱分辨率组,准确率为83%。
提高频谱分辨率通常对常见环境声音的识别有积极影响。然而,尽管在超过8个通道后趋于平稳,但在所有条件下,性能准确率仍远低于具有保留精细频谱的对照刺激。声码器处理过程中引入的跨通道异步虽然对某些声音不利,但不是阻碍整体准确率进一步提高的主要因素。识别许多环境声音可能需要大于32个通道的频谱分辨率,以及额外的精细频谱和时间信息。本研究通过强调对环境声音识别重要的几个声学因素的作用,为优化人工耳蜗使用者的环境声音感知提供了初步基础。