• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于时域语音增强的带自注意力机制的密集卷积神经网络

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

作者信息

Pandey Ashutosh, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.

DOI:10.1109/taslp.2021.3064421
PMID:33997107
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8118093/
Abstract

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.

摘要

近年来,时域语音增强因其能够同时增强语音的幅度和相位而越来越受欢迎。在这项工作中,我们提出了一种带有自注意力机制的密集卷积网络(DCN)用于时域语音增强。DCN是一种基于编码器和解码器且带有跳跃连接的架构。编码器和解码器中的每一层都由一个密集块和一个注意力模块组成。密集块和注意力模块通过特征重用、增加网络深度和最大上下文聚合的组合来帮助进行特征提取。此外,我们揭示了基于增强语音频谱幅度的损失存在的先前未知问题。为了缓解这些问题,我们提出了一种基于增强语音幅度和预测噪声的新型损失。尽管所提出的损失仅基于幅度,但噪声预测施加的约束确保了该损失同时增强幅度和相位。实验结果表明,使用所提出的损失进行训练的DCN在因果和非因果语音增强方面显著优于其他现有技术方法。

相似文献

1
Dense CNN with Self-Attention for Time-Domain Speech Enhancement.用于时域语音增强的带自注意力机制的密集卷积神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.
2
Convolutional fusion network for monaural speech enhancement.卷积融合网络用于单声道语音增强。
Neural Netw. 2021 Nov;143:97-107. doi: 10.1016/j.neunet.2021.05.017. Epub 2021 May 25.
3
A lightweight speech enhancement network fusing bone- and air-conducted speech.融合骨导和气导语音的轻量级语音增强网络
J Acoust Soc Am. 2024 Aug 1;156(2):1355-1366. doi: 10.1121/10.0028339.
4
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
5
Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network.基于动态权重损失和注意力编解码器循环神经网络的因果语音增强。
PLoS One. 2023 May 11;18(5):e0285629. doi: 10.1371/journal.pone.0285629. eCollection 2023.
6
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
7
Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.使用门控卷积递归网络学习复杂频谱映射以实现单声道语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:380-390. doi: 10.1109/taslp.2019.2955276. Epub 2019 Nov 22.
8
DAM: Hierarchical Adaptive Feature Selection Using Convolution Encoder Decoder Network for Strawberry Segmentation.DAM:使用卷积编码器-解码器网络进行草莓分割的分层自适应特征选择
Front Plant Sci. 2021 Feb 22;12:591333. doi: 10.3389/fpls.2021.591333. eCollection 2021.
9
Computed Tomography (CT) Image Quality Enhancement via a Uniform Framework Integrating Noise Estimation and Super-Resolution Networks.通过集成噪声估计和超分辨率网络的统一框架增强计算机断层扫描(CT)图像质量
Sensors (Basel). 2019 Jul 30;19(15):3348. doi: 10.3390/s19153348.
10
Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.利用高效长短时记忆递归神经网络进行深度因果语音增强和识别。
PLoS One. 2024 Jan 3;19(1):e0291240. doi: 10.1371/journal.pone.0291240. eCollection 2024.

引用本文的文献

1
A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments.在混响和混响噪声环境中基于深度神经网络的语音增强的系统研究。
Comput Speech Lang. 2025 Jan;89. doi: 10.1016/j.csl.2024.101677. Epub 2024 Jun 6.
2
End-to-end feature fusion for jointly optimized speech enhancement and automatic speech recognition.用于联合优化语音增强和自动语音识别的端到端特征融合
Sci Rep. 2025 Jul 2;15(1):22892. doi: 10.1038/s41598-025-05057-2.
3
CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE.基于神经级联架构的跨域语音增强
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7862-7866. doi: 10.1109/icassp43922.2022.9747752. Epub 2022 Apr 27.
4
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition.用于稳健自动语音识别的时域语音增强
Interspeech. 2023 Aug;2023:4913-4917. doi: 10.21437/interspeech.2023-167.
5
A Deep Learning Model for Detecting the Arrival Time of Weak Underwater Signals in Fluvial Acoustic Tomography Systems.一种用于在河流声学层析成像系统中检测微弱水下信号到达时间的深度学习模型。
Sensors (Basel). 2025 Feb 3;25(3):922. doi: 10.3390/s25030922.
6
Fusing Bone-conduction and Air-conduction Sensors for Complex-Domain Speech Enhancement.融合骨传导与空气传导传感器用于复域语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:3134-3143. doi: 10.1109/taslp.2022.3209943. Epub 2022 Sep 26.
7
A Survey on Low-Latency DNN-Based Speech Enhancement.基于 DNN 的低延迟语音增强技术研究综述
Sensors (Basel). 2023 Jan 26;23(3):1380. doi: 10.3390/s23031380.
8
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
9
Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.用于语音增强的具有三域损失的神经级联架构
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:734-743. doi: 10.1109/taslp.2021.3138716. Epub 2021 Dec 28.
10
Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.基于时域的语音增强与意图分类神经网络模型联合训练策略。
Sensors (Basel). 2022 Jan 4;22(1):374. doi: 10.3390/s22010374.

本文引用的文献

1
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
2
On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.基于深度学习的语音增强跨语料库泛化研究
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2489-2499. doi: 10.1109/taslp.2020.3016487. Epub 2020 Aug 14.
3
Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.使用带有自注意力机制的时间卷积网络进行单声道语音去混响
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1598-1607. doi: 10.1109/taslp.2020.2995273. Epub 2020 May 18.
4
Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.使用门控卷积递归网络学习复杂频谱映射以实现单声道语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:380-390. doi: 10.1109/taslp.2019.2955276. Epub 2019 Nov 22.
5
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.卷积时域音频分离网络(Conv-TasNet):超越理想时频幅度掩蔽的语音分离方法
IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug;27(8):1256-1266. doi: 10.1109/TASLP.2019.2915167. Epub 2019 May 6.
6
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
7
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
8
Long short-term memory for speaker generalization in supervised speech separation.用于监督语音分离中说话人泛化的长短期记忆网络
J Acoust Soc Am. 2017 Jun;141(6):4705. doi: 10.1121/1.4986931.
9
Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.大规模训练以提高听力受损者在新型噪声环境下的言语可懂度。
J Acoust Soc Am. 2016 May;139(5):2604. doi: 10.1121/1.4948445.
10
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.