基于典范相关分析和深度学习的新型语音可懂度增强模型。

A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:2581-2584. doi: 10.1109/EMBC48229.2022.9871113.

DOI:10.1109/EMBC48229.2022.9871113

Abstract

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals. Despite improving the speech quality, such approaches do not deliver required levels of speech intelligibility in everyday noisy environments. Intelligibility-oriented (I-O) loss functions have recently been developed to train DL approaches for robust speech enhancement. Here, we formulate, for the first time, a novel canonical correlation based I-O loss function to more effectively train DL algorithms. Specifically, we present a canonical-correlation based short-time objective intelligibility (CC-STOI) cost function to train a fully convolutional neural network (FCN) model. We carry out comparative simulation experiments to show that our CC-STOI based speech enhancement framework outperforms state-of-the-art DL models trained with conventional distance-based and STOI-based loss functions, using objective and subjective evaluation measures for case of both unseen speakers and noises. Ongoing future work is evaluating the proposed approach for design of robust hearing-assistive technology.

摘要

目前，基于深度学习（DL）的在噪声环境下增强语音可懂度的方法通常经过训练，可以将无噪声语音和增强后的语音信号之间的特征距离最小化。尽管这些方法提高了语音质量，但在日常嘈杂环境中，它们并不能提供所需的语音可懂度水平。最近，人们开发了面向可懂度的（I-O）损失函数，以训练用于鲁棒语音增强的 DL 方法。在这里，我们首次提出了一种新的基于典型相关的 I-O 损失函数，以更有效地训练 DL 算法。具体来说，我们提出了一种基于典型相关的短时客观可懂度（CC-STOI）代价函数，用于训练全卷积神经网络（FCN）模型。我们进行了比较模拟实验，结果表明，我们的基于 CC-STOI 的语音增强框架在使用客观和主观评估措施的情况下，在看不见的说话者和噪声的情况下，都优于使用传统基于距离和 STOI 的损失函数训练的最先进的 DL 模型。正在进行的未来工作是评估该方法在设计鲁棒性助听技术中的应用。