Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.
Department of Linguistics, University of Rochester , Rochester, New York 14627-0096.
eNeuro. 2015 Jul 20;2(4). doi: 10.1523/ENEURO.0004-15.2015. eCollection 2015 Jul-Aug.
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.
目前的元音神经编码模型通常基于听觉外围的线性描述,在高音量和背景噪声下失效。这些模型依赖于听觉神经放电率或相位锁定到时间精细结构。然而,放电率和相位锁定在中等到高音量下都会饱和,并且在中等到高频时相位锁定在中枢神经系统中会降低。语音清晰度在很宽的音量范围内都很稳健,这对随着音量增加而恶化的代码来说是个问题。此外,一个成功的神经代码必须在听众可以忍受的背景噪声水平下对语音起作用。这里提出的模型解决了这些问题,并结合了非线性听觉外围的几个关键响应特性,包括饱和、同步捕获以及对精细结构和包络时间特征的相位锁定。该模型还包括听觉中脑的特性,其中放电率与幅度波动率相匹配。非线性外围响应特性在整个群体的低频神经率波动幅度中产生对比度。这些波动模式导致中脑的响应特征在很宽的音量范围内和背景噪声中编码元音共振峰。假设的代码得到了清醒兔子下丘脑中的电生理记录的支持。该模型为理解跨语言元音空间的结构提供了信息,并为听力损失的听众提供了自动共振峰检测和语音增强的策略。