Suppr超能文献

语音可懂度的微观与盲预测:理论与实践

Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.

作者信息

Karbasi Mahdie, Zeiler Steffen, Kolossa Dorothea

机构信息

Cognitive signal processing group, Electrical engineering department, Ruhr-Universität Bochum, Universitätsstraße 150, 44801 Bochum, NRW, Germany.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2022;30:2141-2155. doi: 10.1109/taslp.2022.3184888. Epub 2022 Jun 30.

Abstract

Being able to estimate speech intelligibility without the need for listening tests would confer great benefits for a wide range of speech processing applications. Many attempts have therefore been made to introduce an objective, and ideally referencefree measure for this purpose. Most works analyze speech intelligibility prediction (SIP) methods from a macroscopic point of view, averaging over longer time spans. This paper, in contrast, presents a theoretical framework for the microscopic evaluation of SIP methods. Within our framework, a Statistically estimated Accuracy based on Theory (StAT) is derived, which numerically quantifies the statistical limitations inherent in microscopic SIP. A state-of-the-art approach to microscopic SIP, namely, the use of automatic speech recognition (ASR) to directly predict listening test results, is evaluated within this framework. The practical results are in good agreement with the theory. As the final contribution, a fully blind DIscriminative Speech intelligibility Predictor (DISP) is introduced and is also evaluated within the StAT framework. It is shown that this novel, blind estimator can predict intelligibility as well as-and often even with better accuracy than-the non-blind ASR-based approach, and that its results are again in good agreement with its theoretically derived performance potential.

摘要

无需听力测试就能估计语音清晰度,这将为广泛的语音处理应用带来巨大益处。因此,人们进行了许多尝试,旨在为此引入一种客观且理想情况下无需参考的测量方法。大多数研究从宏观角度分析语音清晰度预测(SIP)方法,在较长时间跨度上进行平均。相比之下,本文提出了一个用于微观评估SIP方法的理论框架。在我们的框架内,推导出了基于理论的统计估计准确率(StAT),它从数值上量化了微观SIP中固有的统计局限性。在这个框架内,对一种微观SIP的先进方法,即使用自动语音识别(ASR)直接预测听力测试结果进行了评估。实际结果与理论高度吻合。作为最后的贡献,引入了一种完全盲的判别式语音清晰度预测器(DISP),并同样在StAT框架内进行了评估。结果表明,这种新颖的盲估计器在预测清晰度方面与基于非盲ASR的方法相当,甚至在很多情况下准确率更高,并且其结果再次与理论推导的性能潜力高度一致。

相似文献

1
Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.语音可懂度的微观与盲预测:理论与实践
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:2141-2155. doi: 10.1109/taslp.2022.3184888. Epub 2022 Jun 30.
2
ASR-based speech intelligibility prediction: A review.基于语音识别的语音可懂度预测:综述。
Hear Res. 2022 Dec;426:108606. doi: 10.1016/j.heares.2022.108606. Epub 2022 Sep 14.

本文引用的文献

2
Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.基于频谱-时间调制分析的语音可懂度预测
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验