当前语音增强算法未能提高语音清晰度的原因及建议的解决方案。

Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions.

作者信息

Loizou Philipos C, Kim Gibak

机构信息

The authors are with the Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75083-0688 USA (

出版信息

IEEE Trans Audio Speech Lang Process. 2011;19(1):47-56. doi: 10.1109/TASL.2010.2045180.

DOI:10.1109/TASL.2010.2045180

PMID:21909285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3169296/

Abstract

Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.

摘要

现有的语音增强算法可以提高语音质量，但不能提高语音清晰度，其原因尚不清楚。在本文中，我们提出了一个理论框架，可用于分析可能影响处理后语音清晰度的潜在因素。更具体地说，该框架专注于对语音增强算法引入的失真进行细粒度分析。据推测，如果这些失真得到适当控制，那么语音清晰度可以大幅提高。为了验证这一假设，我们对人类听众进行了清晰度测试，在测试中我们呈现了具有可控语音失真的处理后语音。这些测试的目的是评估语音增强算法可能引入的各种失真对语音清晰度的感知效果。使用三种不同增强算法的结果表明，某些失真比其他失真对语音清晰度下降的影响更大。然而，当这些失真得到适当控制时，人类听众的语音清晰度有了大幅提高，即使是那些已知会降低语音质量和清晰度的谱减法算法也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ebe/3169296/3da38c4bf12e/nihms318747f1.jpg

相似文献

Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions.当前语音增强算法未能提高语音清晰度的原因及建议的解决方案。

IEEE Trans Audio Speech Lang Process. 2011;19(1):47-56. doi: 10.1109/TASL.2010.2045180.

Predicting the intelligibility of vocoded speech.语音编码语音可懂度预测。

Ear Hear. 2011 May-Jun;32(3):331-8. doi: 10.1097/AUD.0b013e3181ff3515.

SNR Loss: A new objective measure for predicting speech intelligibility of noise-suppressed speech.信噪比损失：一种预测噪声抑制语音可懂度的新客观指标。

Speech Commun. 2011 Mar 1;53(3):340-354. doi: 10.1016/j.specom.2010.10.005.

Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality, and response times.感音神经性听力损失患者在噪声环境中语音的频谱对比度增强：对可懂度、质量和反应时间的影响

J Rehabil Res Dev. 1993;30(1):49-72.

Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility.头颈部癌症言语可懂度感知判断的自动建模。

Int J Lang Commun Disord. 2024 Jul-Aug;59(4):1422-1435. doi: 10.1111/1460-6984.13004. Epub 2024 Jan 18.

A comparative intelligibility study of single-microphone noise reduction algorithms.单麦克风降噪算法的可懂度对比研究。

J Acoust Soc Am. 2007 Sep;122(3):1777. doi: 10.1121/1.2766778.

Impact of SNR and Gain-Function Over- and Under-estimation on Speech Intelligibility.信噪比以及增益函数高估和低估对语音清晰度的影响。

Speech Commun. 2012 Feb;54(2):272-281. doi: 10.1016/j.specom.2011.09.002.

Auditory models of suprathreshold distortion and speech intelligibility in persons with impaired hearing.听力受损者的超阈值失真与言语可懂度的听觉模型。

J Am Acad Audiol. 2013 Apr;24(4):307-28. doi: 10.3766/jaaa.24.4.6.

Gain-induced speech distortions and the absence of intelligibility benefit with existing noise-reduction algorithms.增益引起的语音失真和现有降噪算法的可懂度增益缺失。

J Acoust Soc Am. 2011 Sep;130(3):1581-96. doi: 10.1121/1.3619790.

Objective intelligibility measurement of reverberant vocoded speech for normal-hearing listeners: Towards facilitating the development of speech enhancement algorithms for cochlear implants.为正常听力听众测量混响语音编码语音的客观可懂度：促进人工耳蜗语音增强算法的发展。

J Acoust Soc Am. 2024 Mar 1;155(3):2151-2168. doi: 10.1121/10.0025285.

引用本文的文献

Influences of noise reduction on speech intelligibility, listening effort, and sound quality among adults with severe to profound hearing loss.降噪对重度至极重度听力损失成年人言语可懂度、聆听努力程度及声音质量的影响。

Front Neurosci. 2024 Jul 23;18:1407775. doi: 10.3389/fnins.2024.1407775. eCollection 2024.

Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods.六十年频域单声道语音增强：从传统方法到深度学习方法。

Trends Hear. 2023 Jan-Dec;27:23312165231209913. doi: 10.1177/23312165231209913.

Individual Listener Preference for Strength of Single-Microphone Noise-Reduction; Trade-off Between Noise Tolerance and Signal Distortion Tolerance.个体对单麦克风降噪强度的偏好；噪声容忍度和信号失真容忍度之间的权衡。

Trends Hear. 2023 Jan-Dec;27:23312165231192304. doi: 10.1177/23312165231192304.

Enhancement of speech-in-noise comprehension through vibrotactile stimulation at the syllabic rate.通过音节率的振动触觉刺激提高语音噪声理解能力。

Proc Natl Acad Sci U S A. 2022 Mar 29;119(13):e2117000119. doi: 10.1073/pnas.2117000119. Epub 2022 Mar 21.

Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants.基于深度学习的人工耳蜗语音增强：一种权衡语音失真与噪声残留的损失函数

Front Med (Lausanne). 2021 Nov 8;8:740123. doi: 10.3389/fmed.2021.740123. eCollection 2021.

Automated Applications of Acoustics for Stored Product Insect Detection, Monitoring, and Management.声学在仓储产品昆虫检测、监测和管理中的自动化应用

Insects. 2021 Mar 19;12(3):259. doi: 10.3390/insects12030259.

Formant Frequency-based Speech Enhancement Technique to improve Intelligibility for hearing aid users with smartphone as an assistive device.基于共振峰频率的语音增强技术，以提高使用智能手机作为辅助设备的助听器用户的语音清晰度。

Health Innov Point Care Conf. 2017 Nov;2017:32-35. doi: 10.1109/hic.2017.8227577. Epub 2017 Dec 21.

Quantifying the Range of Signal Modification in Clinically Fit Hearing Aids.量化临床适用助听器中的信号修改范围。

Ear Hear. 2020 Mar-Apr;41(2):433-441. doi: 10.1097/AUD.0000000000000767.

Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离：综述

IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.

Efficacy of a Hearing Aid Noise Reduction Function.助听器降噪功能的效果。

Trends Hear. 2018 Jan-Dec;22:2331216518782839. doi: 10.1177/2331216518782839.

本文引用的文献

An algorithm that improves speech intelligibility in noise for normal-hearing listeners.一种可提高听力正常的听众在噪声环境中语音清晰度的算法。

J Acoust Soc Am. 2009 Sep;126(3):1486-94. doi: 10.1121/1.3184603.

Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions.基于新的频段重要性函数预测噪声环境下言语可懂度的客观测量方法。

J Acoust Soc Am. 2009 May;125(5):3387-405. doi: 10.1121/1.3097493.

Speech intelligibility in background noise with ideal binary time-frequency masking.基于理想二元时频掩蔽的背景噪声下语音清晰度

J Acoust Soc Am. 2009 Apr;125(4):2336-47. doi: 10.1121/1.3083233.

Digital noise reduction: outcomes from laboratory and field studies.数字降噪：实验室和实地研究的结果

Int J Audiol. 2008 Aug;47(8):447-60. doi: 10.1080/14992020802033091.

A new sound coding strategy for suppressing noise in cochlear implants.一种用于抑制人工耳蜗噪声的新声音编码策略。

J Acoust Soc Am. 2008 Jul;124(1):498-509. doi: 10.1121/1.2924131.

Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.影响理想二元掩蔽语音可懂度的因素：对降噪的启示

J Acoust Soc Am. 2008 Mar;123(3):1673-82. doi: 10.1121/1.2832617.

Subjective comparison and evaluation of speech enhancement algorithms.语音增强算法的主观比较与评估

Speech Commun. 2007 Jul;49(7):588-601. doi: 10.1016/j.specom.2006.12.006.

A comparative intelligibility study of single-microphone noise reduction algorithms.单麦克风降噪算法的可懂度对比研究。

J Acoust Soc Am. 2007 Sep;122(3):1777. doi: 10.1121/1.2766778.

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.利用理想的时频分离来分离语音对语音掩蔽中的能量成分。

J Acoust Soc Am. 2006 Dec;120(6):4007-18. doi: 10.1121/1.2363929.

Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise.用于预测波动噪声中言语接受阈的扩展言语可懂度指数

J Acoust Soc Am. 2006 Dec;120(6):3988-97. doi: 10.1121/1.2358008.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验