Suppr超能文献

基于频谱-时间调制分析的语音可懂度预测

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

作者信息

Edraki Amin, Chan Wai-Yip, Jensen Jesper, Fogerty Daniel

机构信息

Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada.

Department of Electronic Systems, Aalborg University, 9220 Aalborg, Denmark.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.

Abstract

Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

摘要

频谱-时间调制被认为介导了人类初级听觉皮层中语音声音的分析。受人类在具有挑战性的声学环境中理解语音的稳健性启发,我们基于对纯净语音信号和降级语音信号的频谱-时间调制分析(STMA),为听力正常的听众提出了一种侵入式语音可懂度预测(SIP)算法,即加权频谱-时间调制指数(wSTMI)。在STMA中,55个调制频率通道中的每一个都贡献一个中间可懂度度量。使用套索回归优化参数的稀疏线性模型,可将8个最显著通道的中间度量组合起来用于SIP。与一组10种SIP算法相比,wSTMI在13个数据集上的表现始终良好,这些数据集共同涵盖了包括调制噪声、降噪处理、混响、近端听力增强和语音中断在内的降级条件。我们表明,wSTMI的优化参数可以根据人类听觉系统的调制传递函数来解释。因此,所提出的方法为先前关于语音信号可懂度基础感知特征的研究提供了证据支持。

相似文献

1
Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.
2
Spectro-temporal modulation glimpsing for speech intelligibility prediction.
Hear Res. 2022 Dec;426:108620. doi: 10.1016/j.heares.2022.108620. Epub 2022 Sep 21.
4
Spectro-temporal modulation detection and its relation to speech perception in children with auditory processing disorder.
Int J Pediatr Otorhinolaryngol. 2020 Apr;131:109860. doi: 10.1016/j.ijporl.2020.109860. Epub 2020 Jan 3.
6
Speech intelligibility prediction based on modulation frequency-selective processing.
Hear Res. 2022 Dec;426:108610. doi: 10.1016/j.heares.2022.108610. Epub 2022 Sep 13.
8
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.
Hear Res. 2017 Aug;351:45-54. doi: 10.1016/j.heares.2017.05.009. Epub 2017 May 26.
10
Predicting the intelligibility of reverberant speech for cochlear implant listeners with a non-intrusive intelligibility measure.
Biomed Signal Process Control. 2013 May;8(3):311-314. doi: 10.1016/j.bspc.2012.11.007.

引用本文的文献

4
Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:2141-2155. doi: 10.1109/taslp.2022.3184888. Epub 2022 Jun 30.
5
Modeling the effect of linguistic predictability on speech intelligibility prediction.
JASA Express Lett. 2023 Mar;3(3):035207. doi: 10.1121/10.0017648.
6
Spectro-temporal modulation glimpsing for speech intelligibility prediction.
Hear Res. 2022 Dec;426:108620. doi: 10.1016/j.heares.2022.108620. Epub 2022 Sep 21.
7
Relating Suprathreshold Auditory Processing Abilities to Speech Understanding in Competition.
Brain Sci. 2022 May 27;12(6):695. doi: 10.3390/brainsci12060695.

本文引用的文献

3
Explaining intelligibility in speech-modulated maskers using acoustic glimpse analysis.
J Acoust Soc Am. 2018 Jun;143(6):EL449. doi: 10.1121/1.5041466.
4
Glimpsing speech interrupted by speech-modulated noise.
J Acoust Soc Am. 2018 May;143(5):3058. doi: 10.1121/1.5038273.
5
Sensitivity to change in perception of speech.
Speech Commun. 2003 Aug;41(1):59-69. doi: 10.1016/S0167-6393(02)00093-6.
9
Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices.
IEEE Signal Process Mag. 2015 Mar;32(2):114-124. doi: 10.1109/MSP.2014.2358871.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验