基于听觉中脑元音编码模型的听力损失患者语音增强

Speech enhancement for listeners with hearing loss based on a model for vowel coding in the auditory midbrain.

作者信息

Rao Akshay, Carney Laurel H

出版信息

IEEE Trans Biomed Eng. 2014 Jul;61(7):2081-91. doi: 10.1109/TBME.2014.2313618. Epub 2014 Mar 25.

DOI:10.1109/TBME.2014.2313618

PMID:24686228

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4617199/

Abstract

A novel signal-processing strategy is proposed to enhance speech for listeners with hearing loss. The strategy focuses on improving vowel perception based on a recent hypothesis for vowel coding in the auditory system. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A recent hypothesis focuses instead on vowel encoding in the auditory midbrain, and suggests a robust representation of formants. AN fiber discharge rates are characterized by pitch-related fluctuations having frequency-dependent modulation depths. Fibers tuned to frequencies near formants exhibit weaker pitch-related fluctuations than those tuned to frequencies between formants. Many auditory midbrain neurons show tuning to amplitude modulation frequency in addition to audio frequency. According to the auditory midbrain vowel encoding hypothesis, the response map of a population of midbrain neurons tuned to modulations near voice pitch exhibits minima near formant frequencies, due to the lack of strong pitch-related fluctuations at their inputs. This representation is robust over the range of noise conditions in which speech intelligibility is also robust for normal-hearing listeners. Based on this hypothesis, a vowel-enhancement strategy has been proposed that aims to restore vowel encoding at the level of the auditory midbrain. The signal processing consists of pitch tracking, formant tracking, and formant enhancement. The novel formant-tracking method proposed here estimates the first two formant frequencies by modeling characteristics of the auditory periphery, such as saturated discharge rates of AN fibers and modulation tuning properties of auditory midbrain neurons. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain by increasing the dominance of a single harmonic near each formant and saturating that frequency channel. A MATLAB implementation of the system with low computational complexity was developed. Objective tests of the formant-tracking subsystem on vowels suggest that the method generalizes well over a wide range of speakers and vowels.

摘要

提出了一种新颖的信号处理策略，以增强听力损失患者的语音。该策略基于听觉系统中元音编码的最新假设，专注于改善元音感知。传统上，神经元音编码的研究主要集中在听觉神经（AN）纤维群体放电模式中元音共振峰（元音频谱中的峰值）的表征上。最近的一个假设则侧重于听觉中脑的元音编码，并提出了共振峰的稳健表征。AN纤维放电率的特征是与音高相关的波动，其调制深度与频率有关。调谐到共振峰附近频率的纤维表现出比调谐到共振峰之间频率的纤维更弱的与音高相关的波动。许多听觉中脑神经元除了对音频频率进行调谐外，还对调幅频率进行调谐。根据听觉中脑元音编码假设，一群调谐到接近语音音高调制的中脑神经元的响应图在共振峰频率附近呈现最小值，这是由于其输入处缺乏强烈的与音高相关的波动。这种表征在一系列噪声条件下都是稳健的，在这些条件下，正常听力的听众的语音可懂度也是稳健的。基于这一假设，提出了一种元音增强策略，旨在在听觉中脑水平恢复元音编码。信号处理包括音高跟踪、共振峰跟踪和共振峰增强。这里提出的新颖的共振峰跟踪方法通过对听觉外周的特征进行建模来估计前两个共振峰频率，例如AN纤维的饱和放电率和听觉中脑神经元的调制调谐特性。共振峰增强阶段旨在通过增加每个共振峰附近单个谐波的优势并使该频率通道饱和，在中脑水平恢复共振峰的表征。开发了一个具有低计算复杂度的系统的MATLAB实现。对共振峰跟踪子系统进行的元音客观测试表明，该方法在广泛的说话者和元音范围内具有良好的通用性。

相似文献

Speech enhancement for listeners with hearing loss based on a model for vowel coding in the auditory midbrain.基于听觉中脑元音编码模型的听力损失患者语音增强

IEEE Trans Biomed Eng. 2014 Jul;61(7):2081-91. doi: 10.1109/TBME.2014.2313618. Epub 2014 Mar 25.

Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations.大脑中的语音编码：中脑神经元对声音波动的调整，以代表元音共振峰。

eNeuro. 2015 Jul 20;2(4). doi: 10.1523/ENEURO.0004-15.2015. eCollection 2015 Jul-Aug.

Effects of sensorineural hearing loss on formant-frequency discrimination: Measurements and models.感音神经性听力损失对共振峰频率辨别力的影响：测量与模型。

Hear Res. 2023 Aug;435:108788. doi: 10.1016/j.heares.2023.108788. Epub 2023 May 8.

Midbrain Synchrony to Envelope Structure Supports Behavioral Sensitivity to Single-Formant Vowel-Like Sounds in Noise.中脑与包络结构的同步性支持对噪声中单一共振峰类元音声音的行为敏感性。

J Assoc Res Otolaryngol. 2017 Feb;18(1):165-181. doi: 10.1007/s10162-016-0594-4. Epub 2016 Oct 20.

Auditory nerve representation of vowels in background noise.背景噪声中元音的听神经表征。

J Neurophysiol. 1983 Jul;50(1):27-45. doi: 10.1152/jn.1983.50.1.27.

The neural encoding of formant frequencies contributing to vowel identification in normal-hearing listeners.正常听力听众中有助于元音识别的共振峰频率的神经编码。

J Acoust Soc Am. 2016 Jan;139(1):1-11. doi: 10.1121/1.4931909.

Nonlinear auditory models yield new insights into representations of vowels.非线性听觉模型为元音表征带来了新的见解。

Atten Percept Psychophys. 2019 May;81(4):1034-1046. doi: 10.3758/s13414-018-01644-w.

Speaking fundamental frequency and vowel formant frequencies: effects on perception of gender.基频和元音共振峰频率的发声：对性别感知的影响。

J Voice. 2013 Sep;27(5):556-66. doi: 10.1016/j.jvoice.2012.11.008. Epub 2013 Feb 13.

Vowel and formant representation in the human auditory speech cortex.人类听觉言语皮质中的元音和共振峰表现。

Neuron. 2023 Jul 5;111(13):2105-2118.e4. doi: 10.1016/j.neuron.2023.04.004. Epub 2023 Apr 26.

Acoustic Analysis of Persian Vowels in Cochlear Implant Users: A Comparison With Hearing-impaired Children Using Hearing Aid and Normal-hearing Children.人工耳蜗植入者波斯语元音的声学分析：与使用助听器的听力受损儿童及听力正常儿童的比较

J Voice. 2016 Nov;30(6):763.e1-763.e7. doi: 10.1016/j.jvoice.2015.10.006. Epub 2015 Dec 22.

引用本文的文献

Nonlinear auditory models yield new insights into representations of vowels.非线性听觉模型为元音表征带来了新的见解。

Atten Percept Psychophys. 2019 May;81(4):1034-1046. doi: 10.3758/s13414-018-01644-w.

Supra-Threshold Hearing and Fluctuation Profiles: Implications for Sensorineural and Hidden Hearing Loss.阈上听力与波动特征：对感音神经性听力损失和隐匿性听力损失的影响

J Assoc Res Otolaryngol. 2018 Aug;19(4):331-352. doi: 10.1007/s10162-018-0669-5. Epub 2018 May 9.

Speech Coding in the Midbrain: Effects of Sensorineural Hearing Loss.中脑的语音编码：感音神经性听力损失的影响

Adv Exp Med Biol. 2016;894:427-435. doi: 10.1007/978-3-319-25474-6_45.

Cues for Diotic and Dichotic Detection of a 500-Hz Tone in Noise Vary with Hearing Loss.噪声中500赫兹纯音的双耳和双耳分听检测线索随听力损失而变化。

J Assoc Res Otolaryngol. 2015 Aug;16(4):507-21. doi: 10.1007/s10162-015-0518-8. Epub 2015 May 15.

本文引用的文献

A phenomenological model of the synapse between the inner hair cell and auditory nerve: long-term adaptation with power-law dynamics.内毛细胞和听神经之间突触的现象学模型：具有幂律动力学的长期适应。

J Acoust Soc Am. 2009 Nov;126(5):2390-412. doi: 10.1121/1.3238250.

Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners.辅音与元音信息对年轻听力正常和老年听力受损听众句子可懂度的贡献。

J Acoust Soc Am. 2007 Oct;122(4):2365-75. doi: 10.1121/1.2773986.

Neural rate and timing cues for detection and discrimination of amplitude-modulated tones in the awake rabbit inferior colliculus.清醒家兔下丘中用于检测和辨别调幅音的神经速率和时间线索

J Neurophysiol. 2007 Jan;97(1):522-39. doi: 10.1152/jn.00776.2006. Epub 2006 Nov 1.

The relative roles of vowels and consonants in discriminating talker identity versus word meaning.元音和辅音在区分说话者身份与词义方面的相对作用。

J Acoust Soc Am. 2006 Mar;119(3):1727-39. doi: 10.1121/1.2161431.

The influence of noise on vowel and consonant cues.噪声对元音和辅音线索的影响。

J Acoust Soc Am. 2005 Dec;118(6):3874-88. doi: 10.1121/1.2118407.

Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements.基于耳声发射和行为测量的人类耳蜗调谐修正估计值。

Proc Natl Acad Sci U S A. 2002 Mar 5;99(5):3318-23. doi: 10.1073/pnas.032675099. Epub 2002 Feb 26.

On the effectiveness of whole spectral shape for vowel perception.关于元音感知中全频谱形状的有效性。

J Acoust Soc Am. 2001 Aug;110(2):1141-9. doi: 10.1121/1.1384908.

Auditory temporal processing: responses to sinusoidally amplitude-modulated tones in the inferior colliculus.听觉时间处理：下丘对正弦调幅音的反应。

J Neurophysiol. 2000 Jul;84(1):255-73. doi: 10.1152/jn.2000.84.1.255.

Contrast enhancement improves the representation of /epsilon/-like vowels in the hearing-impaired auditory nerve.对比增强改善了听力受损的听神经中类似/epsilon/元音的表现。

J Acoust Soc Am. 1999 Nov;106(5):2693-708. doi: 10.1121/1.428135.

Monosyllabic word recognition at higher-than-normal speech and noise levels.在高于正常的语音和噪音水平下对单音节词的识别。

J Acoust Soc Am. 1999 Apr;105(4):2431-44. doi: 10.1121/1.426848.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验