Suppr超能文献

基于典范相关分析和深度学习的新型语音可懂度增强模型。

A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:2581-2584. doi: 10.1109/EMBC48229.2022.9871113.

Abstract

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals. Despite improving the speech quality, such approaches do not deliver required levels of speech intelligibility in everyday noisy environments. Intelligibility-oriented (I-O) loss functions have recently been developed to train DL approaches for robust speech enhancement. Here, we formulate, for the first time, a novel canonical correlation based I-O loss function to more effectively train DL algorithms. Specifically, we present a canonical-correlation based short-time objective intelligibility (CC-STOI) cost function to train a fully convolutional neural network (FCN) model. We carry out comparative simulation experiments to show that our CC-STOI based speech enhancement framework outperforms state-of-the-art DL models trained with conventional distance-based and STOI-based loss functions, using objective and subjective evaluation measures for case of both unseen speakers and noises. Ongoing future work is evaluating the proposed approach for design of robust hearing-assistive technology.

摘要

目前,基于深度学习(DL)的在噪声环境下增强语音可懂度的方法通常经过训练,可以将无噪声语音和增强后的语音信号之间的特征距离最小化。尽管这些方法提高了语音质量,但在日常嘈杂环境中,它们并不能提供所需的语音可懂度水平。最近,人们开发了面向可懂度的(I-O)损失函数,以训练用于鲁棒语音增强的 DL 方法。在这里,我们首次提出了一种新的基于典型相关的 I-O 损失函数,以更有效地训练 DL 算法。具体来说,我们提出了一种基于典型相关的短时客观可懂度(CC-STOI)代价函数,用于训练全卷积神经网络(FCN)模型。我们进行了比较模拟实验,结果表明,我们的基于 CC-STOI 的语音增强框架在使用客观和主观评估措施的情况下,在看不见的说话者和噪声的情况下,都优于使用传统基于距离和 STOI 的损失函数训练的最先进的 DL 模型。正在进行的未来工作是评估该方法在设计鲁棒性助听技术中的应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验