语音识别的频谱和时间线索：对听觉假体的影响。

Spectral and temporal cues for speech recognition: implications for auditory prostheses.

作者信息

Xu Li, Pfingst Bryan E

机构信息

School of Hearing, Speech and Language Sciences, Ohio University, Athens, OH 45701, USA.

出版信息

Hear Res. 2008 Aug;242(1-2):132-40. doi: 10.1016/j.heares.2007.12.010. Epub 2007 Dec 28.

DOI:10.1016/j.heares.2007.12.010

PMID:18249077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2610393/

Abstract

Features of stimulation important for speech recognition in people with normal hearing and in people using implanted auditory prostheses include spectral information represented by place of stimulation along the tonotopic axis and temporal information represented in low-frequency envelopes of the signal. The relative contributions of these features to speech recognition and their interactions have been studied using vocoder-like simulations of cochlear implant speech processors presented to listeners with normal hearing. In these studies, spectral/place information was manipulated by varying the number of channels and the temporal-envelope information was manipulated by varying the lowpass cutoffs of the envelope extractors. Consonant and vowel recognition in quiet reached plateau at 8 and 12 channels and lowpass cutoff frequencies of 16 Hz and 4 Hz, respectively. Phoneme (especially vowel) recognition in noise required larger numbers of channels. Lexical tone recognition required larger numbers of channels and higher lowpass cutoff frequencies. There was a tradeoff between spectral/place and temporal-envelope requirements. Most current auditory prostheses seem to deliver adequate temporal-envelope information, but the number of effective channels is suboptimal, particularly for speech recognition in noise, lexical tone recognition, and music perception.

摘要

对于听力正常的人和使用植入式听觉假体的人而言，对语音识别很重要的刺激特征包括沿音频轴的刺激位置所代表的频谱信息以及信号低频包络中所代表的时间信息。利用向听力正常的听众呈现的类似声码器的人工耳蜗语音处理器模拟，已经研究了这些特征对语音识别的相对贡献及其相互作用。在这些研究中，通过改变通道数量来操纵频谱/位置信息，通过改变包络提取器的低通截止频率来操纵时间包络信息。在安静环境中，辅音和元音识别分别在8个和12个通道以及16赫兹和4赫兹的低通截止频率时达到平稳状态。在噪声环境中的音素（尤其是元音）识别需要更多的通道。声调识别需要更多的通道和更高频的低通截止频率。在频谱/位置和时间包络要求之间存在权衡。目前大多数听觉假体似乎能够提供足够的时间包络信息，但有效通道数量并不理想，特别是对于噪声环境中的语音识别、声调识别和音乐感知。

相似文献

Spectral and temporal cues for speech recognition: implications for auditory prostheses.

Hear Res. 2008 Aug;242(1-2):132-40. doi: 10.1016/j.heares.2007.12.010. Epub 2007 Dec 28.

Human Frequency Following Responses to Vocoded Speech.

Ear Hear. 2017 Sep/Oct;38(5):e256-e267. doi: 10.1097/AUD.0000000000000432.

Relative contributions of spectral and temporal cues for phoneme recognition.

J Acoust Soc Am. 2005 May;117(5):3255-67. doi: 10.1121/1.1886405.

Spectral and temporal cues for phoneme recognition in noise.

J Acoust Soc Am. 2007 Sep;122(3):1758. doi: 10.1121/1.2767000.

Relative Contributions of Spectral and Temporal Cues to Korean Phoneme Recognition.

PLoS One. 2015 Jul 10;10(7):e0131807. doi: 10.1371/journal.pone.0131807. eCollection 2015.

Music perception with temporal cues in acoustic and electric hearing.

Ear Hear. 2004 Apr;25(2):173-85. doi: 10.1097/01.aud.0000120365.97792.2f.

The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

J Assoc Res Otolaryngol. 2004 Sep;5(3):253-60. doi: 10.1007/s10162-004-4046-1. Epub 2004 May 20.

Spectral and temporal cues in cochlear implant speech perception.

Ear Hear. 2006 Apr;27(2):208-17. doi: 10.1097/01.aud.0000202312.31837.25.

Timbre and speech perception in bimodal and bilateral cochlear-implant listeners.

Ear Hear. 2012 Sep-Oct;33(5):645-59. doi: 10.1097/AUD.0b013e318252caae.

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses.

J Acoust Soc Am. 2002 Jul;112(1):247-58. doi: 10.1121/1.1487843.

引用本文的文献

Auditory Learning and Generalization in Older Adults: Evidence from Voice Discrimination Training.

Trends Hear. 2025 Jan-Dec;29:23312165251342436. doi: 10.1177/23312165251342436. Epub 2025 May 27.

Concussion acutely disrupts auditory processing in division I football student-athletes.

Brain Inj. 2025 Jan 2;39(1):17-25. doi: 10.1080/02699052.2024.2396012. Epub 2024 Sep 3.

The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.

Trends Hear. 2024 Jan-Dec;28:23312165241275895. doi: 10.1177/23312165241275895.

Exploring neural tracking of acoustic and linguistic speech representations in individuals with post-stroke aphasia.

Hum Brain Mapp. 2024 Jun 1;45(8):e26676. doi: 10.1002/hbm.26676.

Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing.

Hum Brain Mapp. 2023 Dec 1;44(17):6149-6172. doi: 10.1002/hbm.26503. Epub 2023 Oct 11.

Acoustic and phonemic processing are impaired in individuals with aphasia.

Sci Rep. 2023 Jul 11;13(1):11208. doi: 10.1038/s41598-023-37624-w.

The effects of estimation accuracy, estimation approach, and number of selected channels using formant-priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

J Acoust Soc Am. 2023 May 1;153(5):3100. doi: 10.1121/10.0019416.

Effect of Realistic Test Conditions on Perception of Speech, Music, and Binaural Cues in Normal-Hearing Listeners.

Am J Audiol. 2023 Mar;32(1):170-181. doi: 10.1044/2022_AJA-22-00143. Epub 2022 Dec 29.

Difficulties Experienced by Older Listeners in Utilizing Voice Cues for Speaker Discrimination.

Front Psychol. 2022 Mar 3;13:797422. doi: 10.3389/fpsyg.2022.797422. eCollection 2022.

The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Disyllabic Word Recognition.

Front Neurosci. 2021 Jul 15;15:670192. doi: 10.3389/fnins.2021.670192. eCollection 2021.

本文引用的文献

Recognition of familiar melodies by adult cochlear implant recipients and normal-hearing adults.

Cochlear Implants Int. 2002 Mar;3(1):29-53. doi: 10.1179/cim.2002.3.1.29.

Spectral and temporal cues for phoneme recognition in noise.

J Acoust Soc Am. 2007 Sep;122(3):1758. doi: 10.1121/1.2767000.

Psychophysical performance and Mandarin tone recognition in noise by cochlear implant users.

Ear Hear. 2007 Apr;28(2 Suppl):62S-65S. doi: 10.1097/AUD.0b013e318031512c.

Use of a single channel dedicated to conveying enhanced temporal periodicity cues in cochlear implants: effects on prosodic perception and vowel identification.

Int J Audiol. 2007 May;46(5):244-53. doi: 10.1080/14992020601053340.

Accuracy of cochlear implant recipients on pitch perception, melody recognition, and speech reception in noise.

Ear Hear. 2007 Jun;28(3):412-23. doi: 10.1097/AUD.0b013e3180479318.

Speech recognition in normal hearing and sensorineural hearing loss as a function of the number of spectral channels.

J Acoust Soc Am. 2006 Nov;120(5 Pt 1):2908-25. doi: 10.1121/1.2354017.

Temporal and spectral cues in Mandarin tone recognition.

J Acoust Soc Am. 2006 Nov;120(5 Pt 1):2830-40. doi: 10.1121/1.2346009.

Spectral and temporal cues in cochlear implant speech perception.

Ear Hear. 2006 Apr;27(2):208-17. doi: 10.1097/01.aud.0000202312.31837.25.

Improved music perception with explicit pitch coding in cochlear implants.

Audiol Neurootol. 2006;11(1):38-52. doi: 10.1159/000088853. Epub 2005 Oct 10.

Enhancement of temporal periodicity cues in cochlear implants: effects on prosodic perception and vowel identification.

J Acoust Soc Am. 2005 Jul;118(1):375-85. doi: 10.1121/1.1925827.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

语音识别的频谱和时间线索：对听觉假体的影响。

Spectral and temporal cues for speech recognition: implications for auditory prostheses.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献