语音可懂度的微观与盲预测：理论与实践

Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.

作者信息

Karbasi Mahdie, Zeiler Steffen, Kolossa Dorothea

机构信息

Cognitive signal processing group, Electrical engineering department, Ruhr-Universität Bochum, Universitätsstraße 150, 44801 Bochum, NRW, Germany.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2022;30:2141-2155. doi: 10.1109/taslp.2022.3184888. Epub 2022 Jun 30.

DOI:10.1109/taslp.2022.3184888

PMID:37007458

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10065470/

Abstract

Being able to estimate speech intelligibility without the need for listening tests would confer great benefits for a wide range of speech processing applications. Many attempts have therefore been made to introduce an objective, and ideally referencefree measure for this purpose. Most works analyze speech intelligibility prediction (SIP) methods from a macroscopic point of view, averaging over longer time spans. This paper, in contrast, presents a theoretical framework for the microscopic evaluation of SIP methods. Within our framework, a Statistically estimated Accuracy based on Theory (StAT) is derived, which numerically quantifies the statistical limitations inherent in microscopic SIP. A state-of-the-art approach to microscopic SIP, namely, the use of automatic speech recognition (ASR) to directly predict listening test results, is evaluated within this framework. The practical results are in good agreement with the theory. As the final contribution, a fully blind DIscriminative Speech intelligibility Predictor (DISP) is introduced and is also evaluated within the StAT framework. It is shown that this novel, blind estimator can predict intelligibility as well as-and often even with better accuracy than-the non-blind ASR-based approach, and that its results are again in good agreement with its theoretically derived performance potential.

摘要

无需听力测试就能估计语音清晰度，这将为广泛的语音处理应用带来巨大益处。因此，人们进行了许多尝试，旨在为此引入一种客观且理想情况下无需参考的测量方法。大多数研究从宏观角度分析语音清晰度预测（SIP）方法，在较长时间跨度上进行平均。相比之下，本文提出了一个用于微观评估SIP方法的理论框架。在我们的框架内，推导出了基于理论的统计估计准确率（StAT），它从数值上量化了微观SIP中固有的统计局限性。在这个框架内，对一种微观SIP的先进方法，即使用自动语音识别（ASR）直接预测听力测试结果进行了评估。实际结果与理论高度吻合。作为最后的贡献，引入了一种完全盲的判别式语音清晰度预测器（DISP），并同样在StAT框架内进行了评估。结果表明，这种新颖的盲估计器在预测清晰度方面与基于非盲ASR的方法相当，甚至在很多情况下准确率更高，并且其结果再次与理论推导的性能潜力高度一致。

相似文献

Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.语音可懂度的微观与盲预测：理论与实践

IEEE/ACM Trans Audio Speech Lang Process. 2022;30:2141-2155. doi: 10.1109/taslp.2022.3184888. Epub 2022 Jun 30.

ASR-based speech intelligibility prediction: A review.基于语音识别的语音可懂度预测：综述。

Hear Res. 2022 Dec;426:108606. doi: 10.1016/j.heares.2022.108606. Epub 2022 Sep 14.

A joint framework for blind prediction of binaural speech intelligibility and perceived listening effort.用于双耳语音可懂度和感知聆听努力度的盲预测的联合框架。

Hear Res. 2022 Dec;426:108598. doi: 10.1016/j.heares.2022.108598. Epub 2022 Aug 8.

The effect of audiovisual and binaural listening on the acceptable noise level (ANL): establishing an ANL conceptual model.视听和双耳聆听对可接受噪声水平（ANL）的影响：建立ANL概念模型。

J Am Acad Audiol. 2014 Feb;25(2):141-53. doi: 10.3766/jaaa.25.2.3.

Automatic Assessment of Intelligibility in Noise in Parkinson Disease: Validation Study.帕金森病噪声环境下言语可懂度的自动评估：验证研究。

J Med Internet Res. 2022 Oct 20;24(10):e40567. doi: 10.2196/40567.

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients.基于语音后图的语音转换系统，提高构音障碍患者的言语可懂度。

Comput Methods Programs Biomed. 2022 Mar;215:106602. doi: 10.1016/j.cmpb.2021.106602. Epub 2021 Dec 26.

OPRA-RS: A Hearing-Aid Fitting Method Based on Automatic Speech Recognition and Random Search.OPRA-RS：一种基于自动语音识别和随机搜索的助听器验配方法。

Front Neurosci. 2022 Feb 21;16:779048. doi: 10.3389/fnins.2022.779048. eCollection 2022.

Validity of Off-the-Shelf Automatic Speech Recognition for Assessing Speech Intelligibility and Speech Severity in Speakers With Amyotrophic Lateral Sclerosis.用于评估肌萎缩侧索硬化症患者言语可懂度和言语严重度的现成自动语音识别的有效性。

J Speech Lang Hear Res. 2022 Jun 8;65(6):2128-2143. doi: 10.1044/2022_JSLHR-21-00589. Epub 2022 May 27.

Matrix sentence intelligibility prediction using an automatic speech recognition system.使用自动语音识别系统进行矩阵句子可懂度预测。

Int J Audiol. 2015;54 Suppl 2:100-7. doi: 10.3109/14992027.2015.1061708. Epub 2015 Sep 18.

Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility.头颈部癌症言语可懂度感知判断的自动建模。

Int J Lang Commun Disord. 2024 Jul-Aug;59(4):1422-1435. doi: 10.1111/1460-6984.13004. Epub 2024 Jan 18.

引用本文的文献

Modeling the effect of linguistic predictability on speech intelligibility prediction.建模语言可预测性对语音可懂度预测的影响。

JASA Express Lett. 2023 Mar;3(3):035207. doi: 10.1121/10.0017648.

本文引用的文献

Crowdsourcing as a tool in the clinical assessment of intelligibility in dysarthria: How to deal with excessive variation.众包作为一种在构音障碍的临床评估中的工具：如何处理过度的变异性。

J Commun Disord. 2021 Sep-Oct;93:106135. doi: 10.1016/j.jcomdis.2021.106135. Epub 2021 Jun 17.

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.基于频谱-时间调制分析的语音可懂度预测

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.

DARF: A data-reduced FADE version for simulations of speech recognition thresholds with real hearing aids.DARF：一种数据简化的 FADE 版本，用于使用真实助听器模拟语音识别阈值。

Hear Res. 2021 May;404:108217. doi: 10.1016/j.heares.2021.108217. Epub 2021 Feb 22.

Improving hearing-aid gains based on automatic speech recognition.基于自动语音识别提高助听器增益。

J Acoust Soc Am. 2020 Sep;148(3):EL227. doi: 10.1121/10.0001866.

Individual Aided Speech-Recognition Performance and Predictions of Benefit for Listeners With Impaired Hearing Employing FADE.个体辅助语音识别表现和利用 FADE 对听力受损听众获益的预测。

Trends Hear. 2020 Jan-Dec;24:2331216520938929. doi: 10.1177/2331216520938929.

Predicting Speech Perception in Older Listeners with Sensorineural Hearing Loss Using Automatic Speech Recognition.使用自动语音识别技术预测感音神经性听力损失老年患者的言语感知能力。

Trends Hear. 2020 Jan-Dec;24:2331216520914769. doi: 10.1177/2331216520914769.

Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.使用机器学习对不同听众群体的助听器效果进行预测：双耳降噪算法的言语识别性能。

Trends Hear. 2018 Jan-Dec;22:2331216518768954. doi: 10.1177/2331216518768954.

Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain.基于包络功率谱域中的相关度量预测语音可懂度。

J Acoust Soc Am. 2016 Oct;140(4):2670. doi: 10.1121/1.4964505.

Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE: Empowering the Attenuation and Distortion Concept by Plomp With a Quantitative Processing Model.平稳噪声和起伏噪声中听力受损者的句子识别预测：用 FADE 增强 Plomp 的衰减和失真概念，并使用定量处理模型。

Trends Hear. 2016 Sep 7;20:2331216516655795. doi: 10.1177/2331216516655795.

A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception.用于听觉辨别实验的模拟框架：揭示跨频率处理在语音感知中的重要性。

J Acoust Soc Am. 2016 May;139(5):2708. doi: 10.1121/1.4948772.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验