基于频谱-时间调制分析的语音可懂度预测

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

作者信息

Edraki Amin, Chan Wai-Yip, Jensen Jesper, Fogerty Daniel

机构信息

Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada.

Department of Electronic Systems, Aalborg University, 9220 Aalborg, Denmark.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.

DOI:10.1109/taslp.2020.3039929

PMID:33748329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7978234/

Abstract

Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

摘要

频谱-时间调制被认为介导了人类初级听觉皮层中语音声音的分析。受人类在具有挑战性的声学环境中理解语音的稳健性启发，我们基于对纯净语音信号和降级语音信号的频谱-时间调制分析（STMA），为听力正常的听众提出了一种侵入式语音可懂度预测（SIP）算法，即加权频谱-时间调制指数（wSTMI）。在STMA中，55个调制频率通道中的每一个都贡献一个中间可懂度度量。使用套索回归优化参数的稀疏线性模型，可将8个最显著通道的中间度量组合起来用于SIP。与一组10种SIP算法相比，wSTMI在13个数据集上的表现始终良好，这些数据集共同涵盖了包括调制噪声、降噪处理、混响、近端听力增强和语音中断在内的降级条件。我们表明，wSTMI的优化参数可以根据人类听觉系统的调制传递函数来解释。因此，所提出的方法为先前关于语音信号可懂度基础感知特征的研究提供了证据支持。

相似文献

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.基于频谱-时间调制分析的语音可懂度预测

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.

Spectro-temporal modulation glimpsing for speech intelligibility prediction.声谱时变调制窥视用于语音可懂度预测。

Hear Res. 2022 Dec;426:108620. doi: 10.1016/j.heares.2022.108620. Epub 2022 Sep 21.

Auditory models of suprathreshold distortion and speech intelligibility in persons with impaired hearing.听力受损者的超阈值失真与言语可懂度的听觉模型。

J Am Acad Audiol. 2013 Apr;24(4):307-28. doi: 10.3766/jaaa.24.4.6.

Spectro-temporal modulation detection and its relation to speech perception in children with auditory processing disorder.光谱-时间调制检测及其与听觉处理障碍儿童言语感知的关系。

Int J Pediatr Otorhinolaryngol. 2020 Apr;131:109860. doi: 10.1016/j.ijporl.2020.109860. Epub 2020 Jan 3.

The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction.听觉频谱-时间调制滤波及语音可懂度预测决策指标的作用。

J Acoust Soc Am. 2014 Jun;135(6):3502-12. doi: 10.1121/1.4873517.

Speech intelligibility prediction based on modulation frequency-selective processing.基于调制频率选择处理的语音可懂度预测。

Hear Res. 2022 Dec;426:108610. doi: 10.1016/j.heares.2022.108610. Epub 2022 Sep 13.

Spectrotemporal modulation sensitivity as a predictor of speech intelligibility for hearing-impaired listeners.作为听力受损听众言语可懂度预测指标的频谱时间调制敏感性

J Am Acad Audiol. 2013 Apr;24(4):293-306. doi: 10.3766/jaaa.24.4.5.

Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.频谱-时间线索增强人工耳蜗使用者的调制敏感性。

Hear Res. 2017 Aug;351:45-54. doi: 10.1016/j.heares.2017.05.009. Epub 2017 May 26.

Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality.比较通过包络调制传达的信息在语音清晰度、语音质量和音乐质量方面的表现。

J Acoust Soc Am. 2015 Oct;138(4):2470-82. doi: 10.1121/1.4931899.

Predicting the intelligibility of reverberant speech for cochlear implant listeners with a non-intrusive intelligibility measure.使用非侵入性可懂度测量方法预测人工耳蜗植入者在混响环境下言语的可懂度。

Biomed Signal Process Control. 2013 May;8(3):311-314. doi: 10.1016/j.bspc.2012.11.007.

引用本文的文献

Attenuation and distortion components of age-related hearing loss: Contributions to recognizing temporal-envelope filtered speech in modulated noise.年龄相关听力损失的衰减和失真成分：对在调制噪声中识别时域包络滤波语音的贡献。

J Acoust Soc Am. 2024 Jul 1;156(1):93-106. doi: 10.1121/10.0026450.

Speech emotion analysis using convolutional neural network (CNN) and gamma classifier-based error correcting output codes (ECOC).基于卷积神经网络 (CNN) 和基于 Gamma 分类器的纠错输出码 (ECOC) 的语音情感分析。

Sci Rep. 2023 Nov 21;13(1):20398. doi: 10.1038/s41598-023-47118-4.

Sentence recognition with modulation-filtered speech segments for younger and older adults: Effects of hearing impairment and cognition.调制滤波语音段对年轻和老年成年人的句子识别：听力障碍和认知的影响。

J Acoust Soc Am. 2023 Nov 1;154(5):3328-3343. doi: 10.1121/10.0022445.

Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.语音可懂度的微观与盲预测：理论与实践

IEEE/ACM Trans Audio Speech Lang Process. 2022;30:2141-2155. doi: 10.1109/taslp.2022.3184888. Epub 2022 Jun 30.

Modeling the effect of linguistic predictability on speech intelligibility prediction.建模语言可预测性对语音可懂度预测的影响。

JASA Express Lett. 2023 Mar;3(3):035207. doi: 10.1121/10.0017648.

Spectro-temporal modulation glimpsing for speech intelligibility prediction.声谱时变调制窥视用于语音可懂度预测。

Hear Res. 2022 Dec;426:108620. doi: 10.1016/j.heares.2022.108620. Epub 2022 Sep 21.

Relating Suprathreshold Auditory Processing Abilities to Speech Understanding in Competition.将阈上听觉处理能力与竞争环境下的言语理解相关联。

Brain Sci. 2022 May 27;12(6):695. doi: 10.3390/brainsci12060695.

Glimpsing keywords across sentences in noise: A microstructural analysis of acoustic, lexical, and listener factors.在噪声中瞥见句子中的关键词：对声学、词汇和听者因素的微观结构分析。

J Acoust Soc Am. 2021 Sep;150(3):1979. doi: 10.1121/10.0006238.

本文引用的文献

The effect of simulated room acoustic parameters on the intelligibility and perceived reverberation of monosyllabic words and sentences.模拟房间声学参数对单音节词和句子清晰度及感知混响的影响。

J Acoust Soc Am. 2020 May;147(5):EL396. doi: 10.1121/10.0001217.

Predicting the effects of periodicity on the intelligibility of masked speech: An evaluation of different modelling approaches and their limitations.预测周期性对掩蔽语音可懂度的影响：不同建模方法的评估及其局限性。

J Acoust Soc Am. 2019 Oct;146(4):2562. doi: 10.1121/1.5129050.

Explaining intelligibility in speech-modulated maskers using acoustic glimpse analysis.使用声瞬变分析解释语音调制掩蔽器中的可懂度。

J Acoust Soc Am. 2018 Jun;143(6):EL449. doi: 10.1121/1.5041466.

Glimpsing speech interrupted by speech-modulated noise.语音被语音调制噪声打断的 glimpsing。

J Acoust Soc Am. 2018 May;143(5):3058. doi: 10.1121/1.5038273.

Sensitivity to change in perception of speech.对言语感知变化的敏感性。

Speech Commun. 2003 Aug;41(1):59-69. doi: 10.1016/S0167-6393(02)00093-6.

Modulation masking and glimpsing of natural and vocoded speech during single-talker modulated noise: Effect of the modulation spectrum.单说话者调制噪声期间自然语音和编码语音的调制掩蔽与瞥视：调制频谱的影响

J Acoust Soc Am. 2016 Sep;140(3):1800. doi: 10.1121/1.4962494.

Speech-in-noise enhancement using amplification and dynamic range compression controlled by the speech intelligibility index.使用由言语可懂度指数控制的放大和动态范围压缩进行噪声环境下的语音增强。

J Acoust Soc Am. 2015 Nov;138(5):2692-706. doi: 10.1121/1.4932168.

J Acoust Soc Am. 2015 Oct;138(4):2470-82. doi: 10.1121/1.4931899.

Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices.辅助听力设备用户的客观质量与可懂度预测

IEEE Signal Process Mag. 2015 Mar;32(2):114-124. doi: 10.1109/MSP.2014.2358871.

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.可分离的频谱-时间Gabor滤波器组特征：降低用于自动语音识别的稳健特征的复杂度。

J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验