• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在混响和混响噪声环境中基于深度神经网络的语音增强的系统研究。

A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments.

作者信息

Wang Heming, Pandey Ashutosh, Wang DeLiang

机构信息

The Ohio State University, 281 W Lane Ave, Columbus, 43210 OH, United States.

Center for Cognitive and Brain Science, 1835 Neil Ave, Columbus, 43210 OH, United States.

出版信息

Comput Speech Lang. 2025 Jan;89. doi: 10.1016/j.csl.2024.101677. Epub 2024 Jun 6.

DOI:10.1016/j.csl.2024.101677
PMID:40895519
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12396636/
Abstract

Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time-domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely-connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant-only and reverberant-noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time-domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.

摘要

深度学习已使语音增强任务的性能得到显著提升,在该任务中,深度神经网络(DNN)被训练用于从嘈杂和混响的混合语音中恢复纯净语音。现有的大多数基于DNN的算法在频域中运行,因为时域方法被认为在语音去混响方面效果较差。在本研究中,我们采用了两种DNN:ARN(注意力循环网络)和DC-CRN(密集连接卷积循环网络),并系统地研究了不同组件对增强性能的影响,如窗口大小、损失函数和特征表示。我们在两个主要条件下进行评估实验:仅混响和混响加噪声。我们的研究结果表明,采用更大的窗口大小有助于去混响,并且添加变换操作(卷积或线性)来编码和解码波形特征可提高学习表示的稀疏性,并提升时域模型的性能。实验结果表明,采用所提出技术的ARN和DC-CRN与其他强大的增强基线相比,具有更优的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/6d9ad61a04dd/nihms-2076289-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/b65341195a36/nihms-2076289-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/73f3a6eaf2e6/nihms-2076289-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/d4d6eab4636b/nihms-2076289-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/6d9ad61a04dd/nihms-2076289-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/b65341195a36/nihms-2076289-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/73f3a6eaf2e6/nihms-2076289-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/d4d6eab4636b/nihms-2076289-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b45/12396636/6d9ad61a04dd/nihms-2076289-f0004.jpg

相似文献

1
A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments.在混响和混响噪声环境中基于深度神经网络的语音增强的系统研究。
Comput Speech Lang. 2025 Jan;89. doi: 10.1016/j.csl.2024.101677. Epub 2024 Jun 6.
2
Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion.基于深度学习的集成学习和特征融合的孟加拉语语音情感识别
J Imaging. 2025 Aug 14;11(8):273. doi: 10.3390/jimaging11080273.
3
Short-Term Memory Impairment短期记忆障碍
4
Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation.使用具有频率变换的深度复卷积变换器对人工耳蜗植入者进行语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2024;32:2616-2629. doi: 10.1109/taslp.2024.3366760. Epub 2024 Feb 22.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。
Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.
7
Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models.从在细胞图上训练的图神经网络中提取知识,用于非神经学生模型。
Sci Rep. 2025 Aug 10;15(1):29274. doi: 10.1038/s41598-025-13697-7.
8
Cognitive decline assessment using semantic linguistic content and transformer deep learning architecture.使用语义语言内容和变压器深度学习架构评估认知能力下降。
Int J Lang Commun Disord. 2024 May-Jun;59(3):1110-1127. doi: 10.1111/1460-6984.12973. Epub 2023 Nov 16.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
End-to-end feature fusion for jointly optimized speech enhancement and automatic speech recognition.用于联合优化语音增强和自动语音识别的端到端特征融合
Sci Rep. 2025 Jul 2;15(1):22892. doi: 10.1038/s41598-025-05057-2.

本文引用的文献

1
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
2
Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.基于深度学习的双麦克风手机实时语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1853-1863. doi: 10.1109/taslp.2021.3082318. Epub 2021 May 21.
3
Dense CNN with Self-Attention for Time-Domain Speech Enhancement.用于时域语音增强的带自注意力机制的密集卷积神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.
4
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
5
Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.使用带有自注意力机制的时间卷积网络进行单声道语音去混响
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1598-1607. doi: 10.1109/taslp.2020.2995273. Epub 2020 May 18.
6
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
7
Speaker recognition based on deep learning: An overview.基于深度学习的说话人识别:综述。
Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.
8
Attention in Natural Language Processing.自然语言处理中的注意力机制。
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4291-4308. doi: 10.1109/TNNLS.2020.3019893. Epub 2021 Oct 5.
9
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
10
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.