• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

复域中的时频掩蔽用于语音去混响和降噪

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.

作者信息

Williamson Donald S, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering, Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.

DOI:10.1109/TASLP.2017.2696307
PMID:30112422
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6089240/
Abstract

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

摘要

在现实世界的场景中,语音会被背景噪声和混响所掩盖,这会对感知质量和可懂度产生负面影响。在本文中,我们致力于解决混响和嘈杂环境中的单声道语音分离问题。我们使用深度神经网络进行监督学习来执行去混响和降噪。具体而言,我们通过使用复理想比率掩码估计进行分离来增强幅度和相位。我们定义复理想比率掩码,以便在将掩码应用于混响和嘈杂语音后得到直达语音。我们的方法使用模拟和真实房间脉冲响应以及背景噪声进行评估。所提出的方法显著提高了客观语音质量和可懂度。评估和比较表明,在许多混响和嘈杂环境中,它优于相关方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/4b5a2d868db7/nihms959951f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/5809097eb43c/nihms959951f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/6c645ef32496/nihms959951f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/7f6338a2ef1f/nihms959951f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/d18b0d793321/nihms959951f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/469e8cc6470e/nihms959951f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/bc2a955d1ecc/nihms959951f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/4b5a2d868db7/nihms959951f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/5809097eb43c/nihms959951f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/6c645ef32496/nihms959951f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/7f6338a2ef1f/nihms959951f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/d18b0d793321/nihms959951f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/469e8cc6470e/nihms959951f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/bc2a955d1ecc/nihms959951f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/6089240/4b5a2d868db7/nihms959951f7.jpg

相似文献

1
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.
2
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.
3
Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.使用带有自注意力机制的时间卷积网络进行单声道语音去混响
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1598-1607. doi: 10.1109/taslp.2020.2995273. Epub 2020 May 18.
4
A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions.基于深度学习的分割算法,可提高在混响噪声环境下听力障碍者的语音可懂度。
J Acoust Soc Am. 2018 Sep;144(3):1627. doi: 10.1121/1.5055562.
5
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
6
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
7
Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network.三零:基于端到端冻结无声语音分离网络的零样本去噪和去混响。
PLoS One. 2024 Jul 16;19(7):e0301692. doi: 10.1371/journal.pone.0301692. eCollection 2024.
8
Deep Learning Based Binaural Speech Separation in Reverberant Environments.基于深度学习的混响环境下双耳语音分离
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.
9
Simultaneous suppression of noise and reverberation in cochlear implants using a ratio masking strategy.使用比掩蔽策略同时抑制人工耳蜗中的噪声和混响。
J Acoust Soc Am. 2013 Nov;134(5):3759-65. doi: 10.1121/1.4823839.
10
Intelligibility of reverberant noisy speech with ideal binary masking.用理想二值掩蔽评估混响噪声语音的可懂度。
J Acoust Soc Am. 2011 Oct;130(4):2153-61. doi: 10.1121/1.3631668.

引用本文的文献

1
Deep-Learning Framework for Efficient Real-Time Speech Enhancement and Dereverberation.用于高效实时语音增强和去混响的深度学习框架。
Sensors (Basel). 2025 Jan 22;25(3):630. doi: 10.3390/s25030630.
2
Impact of Mask Type as Training Target for Speech Intelligibility and Quality in Cochlear-Implant Noise Reduction.口罩类型对人工耳蜗降噪言语可懂度和质量训练目标的影响。
Sensors (Basel). 2024 Oct 14;24(20):6614. doi: 10.3390/s24206614.
3
Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network.三零:基于端到端冻结无声语音分离网络的零样本去噪和去混响。
PLoS One. 2024 Jul 16;19(7):e0301692. doi: 10.1371/journal.pone.0301692. eCollection 2024.
4
Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods.六十年频域单声道语音增强:从传统方法到深度学习方法。
Trends Hear. 2023 Jan-Dec;27:23312165231209913. doi: 10.1177/23312165231209913.
5
A Survey on Low-Latency DNN-Based Speech Enhancement.基于 DNN 的低延迟语音增强技术研究综述
Sensors (Basel). 2023 Jan 26;23(3):1380. doi: 10.3390/s23031380.
6
Speech Enhancement by Multiple Propagation through the Same Neural Network.通过同一个神经网络多次传播进行语音增强。
Sensors (Basel). 2022 Mar 22;22(7):2440. doi: 10.3390/s22072440.
7
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
8
USING MACHINE LEARNING TO MITIGATE THE EFFECTS OF REVERBERATION AND NOISE IN COCHLEAR IMPLANTS.利用机器学习减轻人工耳蜗中的混响和噪声影响。
Proc Meet Acoust. 2018 May 7;33(1). doi: 10.1121/2.0000905. Epub 2018 Oct 8.
9
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.

本文引用的文献

1
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
2
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
3
Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold.混响环境下理想二值掩蔽对言语可懂度的影响:早期反射声和信噪比阈的作用。
J Acoust Soc Am. 2013 Mar;133(3):1707-17. doi: 10.1121/1.4789895.
4
An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.基于时频加权噪声语音可懂度预测的客观测量评估。
J Acoust Soc Am. 2011 Nov;130(5):3013-27. doi: 10.1121/1.3641373.
5
Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions.基于新的频段重要性函数预测噪声环境下言语可懂度的客观测量方法。
J Acoust Soc Am. 2009 May;125(5):3387-405. doi: 10.1121/1.3097493.
6
Pitch-based monaural segregation of reverberant speech.基于基频的混响语音单声道分离
J Acoust Soc Am. 2006 Jul;120(1):458-69. doi: 10.1121/1.2204590.
7
On the importance of early reflections for speech in rooms.关于早期反射对室内语音的重要性。
J Acoust Soc Am. 2003 Jun;113(6):3233-44. doi: 10.1121/1.1570439.
8
Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction.基于调制感知和双耳交互的生理与心理声学模型的语音增强
J Acoust Soc Am. 1994 Mar;95(3):1593-602. doi: 10.1121/1.408546.
9
Monaural and binaural speech perception through hearing aids under noise and reverberation with normal and hearing-impaired listeners.正常听力和听力受损的听众在噪声和混响环境下通过助听器进行单耳和双耳语音感知。
J Speech Hear Res. 1974 Dec;17(4):724-39. doi: 10.1044/jshr.1704.724.
10
Perceptual linear predictive (PLP) analysis of speech.语音的感知线性预测(PLP)分析
J Acoust Soc Am. 1990 Apr;87(4):1738-52. doi: 10.1121/1.399423.