用于时域语音增强的带自注意力机制的密集卷积神经网络

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

作者信息

Pandey Ashutosh, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.

DOI:10.1109/taslp.2021.3064421

PMID:33997107

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8118093/

Abstract

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.

摘要

近年来，时域语音增强因其能够同时增强语音的幅度和相位而越来越受欢迎。在这项工作中，我们提出了一种带有自注意力机制的密集卷积网络（DCN）用于时域语音增强。DCN是一种基于编码器和解码器且带有跳跃连接的架构。编码器和解码器中的每一层都由一个密集块和一个注意力模块组成。密集块和注意力模块通过特征重用、增加网络深度和最大上下文聚合的组合来帮助进行特征提取。此外，我们揭示了基于增强语音频谱幅度的损失存在的先前未知问题。为了缓解这些问题，我们提出了一种基于增强语音幅度和预测噪声的新型损失。尽管所提出的损失仅基于幅度，但噪声预测施加的约束确保了该损失同时增强幅度和相位。实验结果表明，使用所提出的损失进行训练的DCN在因果和非因果语音增强方面显著优于其他现有技术方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于时域语音增强的带自注意力机制的密集卷积神经网络

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

用于时域语音增强的带自注意力机制的密集卷积神经网络

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献