Rao Akshay, Carney Laurel H
IEEE Trans Biomed Eng. 2014 Jul;61(7):2081-91. doi: 10.1109/TBME.2014.2313618. Epub 2014 Mar 25.
A novel signal-processing strategy is proposed to enhance speech for listeners with hearing loss. The strategy focuses on improving vowel perception based on a recent hypothesis for vowel coding in the auditory system. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A recent hypothesis focuses instead on vowel encoding in the auditory midbrain, and suggests a robust representation of formants. AN fiber discharge rates are characterized by pitch-related fluctuations having frequency-dependent modulation depths. Fibers tuned to frequencies near formants exhibit weaker pitch-related fluctuations than those tuned to frequencies between formants. Many auditory midbrain neurons show tuning to amplitude modulation frequency in addition to audio frequency. According to the auditory midbrain vowel encoding hypothesis, the response map of a population of midbrain neurons tuned to modulations near voice pitch exhibits minima near formant frequencies, due to the lack of strong pitch-related fluctuations at their inputs. This representation is robust over the range of noise conditions in which speech intelligibility is also robust for normal-hearing listeners. Based on this hypothesis, a vowel-enhancement strategy has been proposed that aims to restore vowel encoding at the level of the auditory midbrain. The signal processing consists of pitch tracking, formant tracking, and formant enhancement. The novel formant-tracking method proposed here estimates the first two formant frequencies by modeling characteristics of the auditory periphery, such as saturated discharge rates of AN fibers and modulation tuning properties of auditory midbrain neurons. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain by increasing the dominance of a single harmonic near each formant and saturating that frequency channel. A MATLAB implementation of the system with low computational complexity was developed. Objective tests of the formant-tracking subsystem on vowels suggest that the method generalizes well over a wide range of speakers and vowels.
提出了一种新颖的信号处理策略,以增强听力损失患者的语音。该策略基于听觉系统中元音编码的最新假设,专注于改善元音感知。传统上,神经元音编码的研究主要集中在听觉神经(AN)纤维群体放电模式中元音共振峰(元音频谱中的峰值)的表征上。最近的一个假设则侧重于听觉中脑的元音编码,并提出了共振峰的稳健表征。AN纤维放电率的特征是与音高相关的波动,其调制深度与频率有关。调谐到共振峰附近频率的纤维表现出比调谐到共振峰之间频率的纤维更弱的与音高相关的波动。许多听觉中脑神经元除了对音频频率进行调谐外,还对调幅频率进行调谐。根据听觉中脑元音编码假设,一群调谐到接近语音音高调制的中脑神经元的响应图在共振峰频率附近呈现最小值,这是由于其输入处缺乏强烈的与音高相关的波动。这种表征在一系列噪声条件下都是稳健的,在这些条件下,正常听力的听众的语音可懂度也是稳健的。基于这一假设,提出了一种元音增强策略,旨在在听觉中脑水平恢复元音编码。信号处理包括音高跟踪、共振峰跟踪和共振峰增强。这里提出的新颖的共振峰跟踪方法通过对听觉外周的特征进行建模来估计前两个共振峰频率,例如AN纤维的饱和放电率和听觉中脑神经元的调制调谐特性。共振峰增强阶段旨在通过增加每个共振峰附近单个谐波的优势并使该频率通道饱和,在中脑水平恢复共振峰的表征。开发了一个具有低计算复杂度的系统的MATLAB实现。对共振峰跟踪子系统进行的元音客观测试表明,该方法在广泛的说话者和元音范围内具有良好的通用性。