• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多通道语音增强算法的音素级评估

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms.

作者信息

Monir Nasser-Eddine, Magron Paul, Serizel Romain

机构信息

Université de Lorraine, CNRS, Inria, Loria, Nancy, France.

出版信息

Trends Hear. 2024 Jan-Dec;28:23312165241292205. doi: 10.1177/23312165241292205.

DOI:10.1177/23312165241292205
PMID:39665436
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11638999/
Abstract

In the intricate acoustic landscapes where speech intelligibility is challenged by noise and reverberation, multichannel speech enhancement emerges as a promising solution for individuals with hearing loss. Such algorithms are commonly evaluated at the utterance scale. However, this approach overlooks the granular acoustic nuances revealed by phoneme-specific analysis, potentially obscuring key insights into their performance. This paper presents an in-depth phoneme-scale evaluation of three state-of-the-art multichannel speech enhancement algorithms. These algorithms-filter-and-sum network, minimum variance distortionless response, and Tango-are here extensively evaluated across different noise conditions and spatial setups, employing realistic acoustic simulations with measured room impulse responses, and leveraging diversity offered by multiple microphones in a binaural hearing setup. The study emphasizes the fine-grained phoneme-scale analysis, revealing that while some phonemes like plosives are heavily impacted by environmental acoustics and challenging to deal with by the algorithms, others like nasals and sibilants see substantial improvements after enhancement. These investigations demonstrate important improvements in phoneme clarity in noisy conditions, with insights that could drive the development of more personalized and phoneme-aware hearing aid technologies. Additionally, while this study provides extensive data on the physical metrics of processed speech, these physical metrics do not necessarily imitate human perceptions of speech, and the impact of the findings presented would have to be investigated through listening tests.

摘要

在语音清晰度受到噪声和混响挑战的复杂声学环境中,多通道语音增强技术成为一种有前景的解决方案,适用于听力损失人群。此类算法通常在话语尺度上进行评估。然而,这种方法忽略了音素特定分析所揭示的细微声学差异,可能会掩盖对其性能的关键见解。本文对三种先进的多通道语音增强算法进行了深入的音素尺度评估。这些算法——滤波求和网络、最小方差无失真响应和探戈算法——在此针对不同噪声条件和空间设置进行了广泛评估,采用了带有实测房间脉冲响应的逼真声学模拟,并利用双耳听力设置中多个麦克风提供的多样性。该研究强调了细粒度的音素尺度分析,结果表明,虽然像爆破音这样的一些音素受到环境声学的严重影响,算法处理起来具有挑战性,但像鼻音和咝音这样的其他音素在增强后有显著改善。这些研究表明,在嘈杂环境中,音素清晰度有了重要提升,所得见解可能会推动更个性化、音素感知型助听器技术的发展。此外,虽然本研究提供了关于处理后语音物理指标的大量数据,但这些物理指标不一定能模拟人类对语音的感知,所呈现结果的影响还必须通过听力测试来研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/9a1e230d2c3a/10.1177_23312165241292205-fig20.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/d98bb678189e/10.1177_23312165241292205-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/edfaede8db2f/10.1177_23312165241292205-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/50d610ffcf4f/10.1177_23312165241292205-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/da9ef6f75b47/10.1177_23312165241292205-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/74ea05472e97/10.1177_23312165241292205-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/d9bd79df1301/10.1177_23312165241292205-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/2f07820c8a26/10.1177_23312165241292205-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/b22bcb2ef80d/10.1177_23312165241292205-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/2a1e2900270f/10.1177_23312165241292205-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/3d34491327c3/10.1177_23312165241292205-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/6580a154a14e/10.1177_23312165241292205-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/cd9bd52d213a/10.1177_23312165241292205-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/4730ae908e41/10.1177_23312165241292205-fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/bd8cc44ca5b4/10.1177_23312165241292205-fig14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/6052662f191d/10.1177_23312165241292205-fig15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/1c673cbf47bf/10.1177_23312165241292205-fig16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/c90abdb7da9d/10.1177_23312165241292205-fig17.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/636e72f18afa/10.1177_23312165241292205-fig18.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/dbe45bad4d5f/10.1177_23312165241292205-fig19.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/9a1e230d2c3a/10.1177_23312165241292205-fig20.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/d98bb678189e/10.1177_23312165241292205-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/edfaede8db2f/10.1177_23312165241292205-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/50d610ffcf4f/10.1177_23312165241292205-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/da9ef6f75b47/10.1177_23312165241292205-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/74ea05472e97/10.1177_23312165241292205-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/d9bd79df1301/10.1177_23312165241292205-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/2f07820c8a26/10.1177_23312165241292205-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/b22bcb2ef80d/10.1177_23312165241292205-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/2a1e2900270f/10.1177_23312165241292205-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/3d34491327c3/10.1177_23312165241292205-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/6580a154a14e/10.1177_23312165241292205-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/cd9bd52d213a/10.1177_23312165241292205-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/4730ae908e41/10.1177_23312165241292205-fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/bd8cc44ca5b4/10.1177_23312165241292205-fig14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/6052662f191d/10.1177_23312165241292205-fig15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/1c673cbf47bf/10.1177_23312165241292205-fig16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/c90abdb7da9d/10.1177_23312165241292205-fig17.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/636e72f18afa/10.1177_23312165241292205-fig18.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/dbe45bad4d5f/10.1177_23312165241292205-fig19.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8164/11638999/9a1e230d2c3a/10.1177_23312165241292205-fig20.jpg

相似文献

1
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms.多通道语音增强算法的音素级评估
Trends Hear. 2024 Jan-Dec;28:23312165241292205. doi: 10.1177/23312165241292205.
2
Benefit of Hearing-Aid Amplification and Signal Enhancement for Speech Reception in Complex Listening Situations.助听放大和信号增强对复杂聆听环境下言语感知的益处。
Trends Hear. 2024 Jan-Dec;28:23312165241271407. doi: 10.1177/23312165241271407.
3
Modeling the Intelligibility Benefit of Active Noise Cancelation in Hearing Devices That Improve Signal-to-Noise Ratio.主动降噪在提高信噪比的听力设备中的可懂度增益建模。
Trends Hear. 2024 Jan-Dec;28:23312165241260029. doi: 10.1177/23312165241260029.
4
Toward an Extended Classification of Noise-Distortion Preferences by Modeling Longitudinal Dynamics of Listening Choices.通过对听力选择的纵向动态进行建模实现噪声-失真偏好的扩展分类
Trends Hear. 2025 Jan-Dec;29:23312165251362018. doi: 10.1177/23312165251362018. Epub 2025 Aug 7.
5
Evaluation of Speaker-Conditioned Target Speaker Extraction Algorithms for Hearing-Impaired Listeners.针对听力受损听众的说话者条件目标说话者提取算法评估
Trends Hear. 2025 Jan-Dec;29:23312165251365802. doi: 10.1177/23312165251365802. Epub 2025 Aug 11.
6
The Impact of Hearing Aids on Listening Effort and Listening-Related Fatigue - Investigations in a Virtual Realistic Listening Environment.助听器对听力努力和听力疲劳的影响 - 在虚拟现实聆听环境中的研究。
Trends Hear. 2024 Jan-Dec;28:23312165241265199. doi: 10.1177/23312165241265199.
7
Focusing on Positive Listening Experiences Improves Speech Intelligibility in Experienced Hearing Aid Users.专注于积极的聆听体验可提高有经验的助听器使用者的言语清晰度。
Trends Hear. 2024 Jan-Dec;28:23312165241246616. doi: 10.1177/23312165241246616.
8
Automated Measurement of Speech Recognition, Reaction Time, and Speech Rate and Their Relation to Self-Reported Listening Effort for Normal-Hearing and Hearing-Impaired Listeners Using various Maskers.使用不同掩蔽噪声对正常听力和听力障碍者的言语识别率、反应时、言语率的自动测量及其与自我报告的听力努力度的关系。
Trends Hear. 2024 Jan-Dec;28:23312165241276435. doi: 10.1177/23312165241276435.
9
Objective measure of binaural processing: Acoustic change complex in response to interaural phase differences.客观测量双耳处理:对耳间相位差的声变复合反应。
Hear Res. 2024 Jul;448:109020. doi: 10.1016/j.heares.2024.109020. Epub 2024 Apr 28.
10
On the Feasibility of Using Behavioral Listening Effort Test Methods to Evaluate Auditory Performance in Cochlear Implant Users.关于使用行为性听力努力测试方法评估人工耳蜗使用者听觉表现的可行性
Trends Hear. 2024 Jan-Dec;28:23312165241240572. doi: 10.1177/23312165241240572.

本文引用的文献

1
Speech-in-Noise Audiometry in Adults: A Review of the Available Tests for French Speakers.成人噪声下言语测听:对法语使用者可用测试的综述。
Audiol Neurootol. 2022;27(3):185-199. doi: 10.1159/000518968. Epub 2021 Dec 22.
2
Assessing the efficacy of hearing-aid amplification using a phoneme test.使用音素测试评估助听器放大效果。
J Acoust Soc Am. 2017 Mar;141(3):1739. doi: 10.1121/1.4976066.
3
Predicting consonant recognition and confusions in normal-hearing listeners.预测听力正常的听众的辅音识别与混淆情况。
J Acoust Soc Am. 2017 Feb;141(2):1051. doi: 10.1121/1.4976054.
4
Across- and within-consonant errors for isolated syllables in noise.噪声环境中孤立音节的辅音间和辅音内错误
J Speech Lang Hear Res. 2014 Dec;57(6):2293-307. doi: 10.1044/2014_JSLHR-H-13-0244.
5
Notionally steady background noise acts primarily as a modulation masker of speech.从概念上讲,稳定的背景噪声主要充当言语的调制掩蔽器。
J Acoust Soc Am. 2012 Jul;132(1):317-26. doi: 10.1121/1.4725766.
6
Human phoneme recognition depending on speech-intrinsic variability.基于语音内在变异性的人类音素识别。
J Acoust Soc Am. 2010 Nov;128(5):3126-41. doi: 10.1121/1.3493450.
7
A psychoacoustic method to find the perceptual cues of stop consonants in natural speech.一种用于在自然语音中寻找塞音感知线索的心理声学方法。
J Acoust Soc Am. 2010 Apr;127(4):2599-610. doi: 10.1121/1.3295689.
8
Consonant identification in consonant-vowel-consonant syllables in speech-spectrum noise.在语音噪声中的辅音-元音-辅音音节中进行辅音识别。
J Acoust Soc Am. 2010 Mar;127(3):1609-23. doi: 10.1121/1.3293005.
9
Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids.多麦克风双耳助听器中基于多通道维纳滤波技术的语音增强
J Acoust Soc Am. 2009 Jan;125(1):360-71. doi: 10.1121/1.3023069.
10
Consonant confusions in white noise.白噪声中的辅音混淆。
J Acoust Soc Am. 2008 Aug;124(2):1220-33. doi: 10.1121/1.2913251.