Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America.
Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America.
PLoS Comput Biol. 2023 Feb 14;19(2):e1010862. doi: 10.1371/journal.pcbi.1010862. eCollection 2023 Feb.
Theories of efficient coding propose that the auditory system is optimized for the statistical structure of natural sounds, yet the transformations underlying optimal acoustic representations are not well understood. Using a database of natural sounds including human speech and a physiologically-inspired auditory model, we explore the consequences of peripheral (cochlear) and mid-level (auditory midbrain) filter tuning transformations on the representation of natural sound spectra and modulation statistics. Whereas Fourier-based sound decompositions have constant time-frequency resolution at all frequencies, cochlear and auditory midbrain filters bandwidths increase proportional to the filter center frequency. This form of bandwidth scaling produces a systematic decrease in spectral resolution and increase in temporal resolution with increasing frequency. Here we demonstrate that cochlear bandwidth scaling produces a frequency-dependent gain that counteracts the tendency of natural sound power to decrease with frequency, resulting in a whitened output representation. Similarly, bandwidth scaling in mid-level auditory filters further enhances the representation of natural sounds by producing a whitened modulation power spectrum (MPS) with higher modulation entropy than both the cochlear outputs and the conventional Fourier MPS. These findings suggest that the tuning characteristics of the peripheral and mid-level auditory system together produce a whitened output representation in three dimensions (frequency, temporal and spectral modulation) that reduces redundancies and allows for a more efficient use of neural resources. This hierarchical multi-stage tuning strategy is thus likely optimized to extract available information and may underlies perceptual sensitivity to natural sounds.
有效编码理论提出,听觉系统是针对自然声音的统计结构进行优化的,然而,最佳声学表示的基础转换尚不清楚。我们使用包括人类语音在内的自然声音数据库和一种基于生理学的听觉模型,探索了外围(耳蜗)和中层次(听觉中脑)滤波器调谐转换对自然声音频谱和调制统计表示的影响。虽然基于傅里叶的声音分解在所有频率上具有恒定的时频分辨率,但耳蜗和听觉中脑滤波器的带宽与滤波器中心频率成正比增加。这种带宽缩放形式会导致随着频率的增加,频谱分辨率系统性降低,时间分辨率增加。在这里,我们证明了耳蜗带宽缩放产生了一种与自然声音功率随频率降低的趋势相反的频率相关增益,从而产生了白化输出表示。类似地,中层次听觉滤波器中的带宽缩放通过产生具有比耳蜗输出和传统傅里叶 MPS 更高调制熵的白化调制功率谱(MPS),进一步增强了自然声音的表示。这些发现表明,外围和中层次听觉系统的调谐特性共同产生了一个在三个维度(频率、时间和频谱调制)上白化的输出表示,减少了冗余,并允许更有效地利用神经资源。因此,这种分层多阶段调谐策略可能是为了提取可用信息而优化的,并且可能是对自然声音的感知敏感性的基础。