Suppr超能文献

基于听觉中脑元音编码模型的听力损失患者语音增强

Speech enhancement for listeners with hearing loss based on a model for vowel coding in the auditory midbrain.

作者信息

Rao Akshay, Carney Laurel H

出版信息

IEEE Trans Biomed Eng. 2014 Jul;61(7):2081-91. doi: 10.1109/TBME.2014.2313618. Epub 2014 Mar 25.

Abstract

A novel signal-processing strategy is proposed to enhance speech for listeners with hearing loss. The strategy focuses on improving vowel perception based on a recent hypothesis for vowel coding in the auditory system. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A recent hypothesis focuses instead on vowel encoding in the auditory midbrain, and suggests a robust representation of formants. AN fiber discharge rates are characterized by pitch-related fluctuations having frequency-dependent modulation depths. Fibers tuned to frequencies near formants exhibit weaker pitch-related fluctuations than those tuned to frequencies between formants. Many auditory midbrain neurons show tuning to amplitude modulation frequency in addition to audio frequency. According to the auditory midbrain vowel encoding hypothesis, the response map of a population of midbrain neurons tuned to modulations near voice pitch exhibits minima near formant frequencies, due to the lack of strong pitch-related fluctuations at their inputs. This representation is robust over the range of noise conditions in which speech intelligibility is also robust for normal-hearing listeners. Based on this hypothesis, a vowel-enhancement strategy has been proposed that aims to restore vowel encoding at the level of the auditory midbrain. The signal processing consists of pitch tracking, formant tracking, and formant enhancement. The novel formant-tracking method proposed here estimates the first two formant frequencies by modeling characteristics of the auditory periphery, such as saturated discharge rates of AN fibers and modulation tuning properties of auditory midbrain neurons. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain by increasing the dominance of a single harmonic near each formant and saturating that frequency channel. A MATLAB implementation of the system with low computational complexity was developed. Objective tests of the formant-tracking subsystem on vowels suggest that the method generalizes well over a wide range of speakers and vowels.

摘要

提出了一种新颖的信号处理策略,以增强听力损失患者的语音。该策略基于听觉系统中元音编码的最新假设,专注于改善元音感知。传统上,神经元音编码的研究主要集中在听觉神经(AN)纤维群体放电模式中元音共振峰(元音频谱中的峰值)的表征上。最近的一个假设则侧重于听觉中脑的元音编码,并提出了共振峰的稳健表征。AN纤维放电率的特征是与音高相关的波动,其调制深度与频率有关。调谐到共振峰附近频率的纤维表现出比调谐到共振峰之间频率的纤维更弱的与音高相关的波动。许多听觉中脑神经元除了对音频频率进行调谐外,还对调幅频率进行调谐。根据听觉中脑元音编码假设,一群调谐到接近语音音高调制的中脑神经元的响应图在共振峰频率附近呈现最小值,这是由于其输入处缺乏强烈的与音高相关的波动。这种表征在一系列噪声条件下都是稳健的,在这些条件下,正常听力的听众的语音可懂度也是稳健的。基于这一假设,提出了一种元音增强策略,旨在在听觉中脑水平恢复元音编码。信号处理包括音高跟踪、共振峰跟踪和共振峰增强。这里提出的新颖的共振峰跟踪方法通过对听觉外周的特征进行建模来估计前两个共振峰频率,例如AN纤维的饱和放电率和听觉中脑神经元的调制调谐特性。共振峰增强阶段旨在通过增加每个共振峰附近单个谐波的优势并使该频率通道饱和,在中脑水平恢复共振峰的表征。开发了一个具有低计算复杂度的系统的MATLAB实现。对共振峰跟踪子系统进行的元音客观测试表明,该方法在广泛的说话者和元音范围内具有良好的通用性。

相似文献

5
9
Vowel and formant representation in the human auditory speech cortex.人类听觉言语皮质中的元音和共振峰表现。
Neuron. 2023 Jul 5;111(13):2105-2118.e4. doi: 10.1016/j.neuron.2023.04.004. Epub 2023 Apr 26.

本文引用的文献

5
The influence of noise on vowel and consonant cues.噪声对元音和辅音线索的影响。
J Acoust Soc Am. 2005 Dec;118(6):3874-88. doi: 10.1121/1.2118407.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验