• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

三阶段混合神经波束形成器用于多通道语音增强。

Three-stage hybrid neural beamformer for multi-channel speech enhancement.

机构信息

Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China.

State Key Laboratory of Acoustics, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

J Acoust Soc Am. 2023 Jun 1;153(6):3378. doi: 10.1121/10.0019802.

DOI:10.1121/10.0019802
PMID:37342887
Abstract

This paper proposes a hybrid neural beamformer for multi-channel speech enhancement, which comprises three stages, i.e., beamforming, post-filtering, and distortion compensation, called TriU-Net. The TriU-Net first estimates a set of masks to be used within a minimum variance distortionless response beamformer. A deep neural network (DNN)-based post-filter is then utilized to suppress the residual noise. Finally, a DNN-based distortion compensator is followed to further improve speech quality. To characterize the long-range temporal dependencies more efficiently, a network topology, gated convolutional attention network, is proposed and utilized in the TriU-Net. The advantage of the proposed model is that the speech distortion compensation is explicitly considered, yielding higher speech quality and intelligibility. The proposed model achieved an average 2.854 wb-PESQ score and 92.57% ESTOI on the CHiME-3 dataset. In addition, extensive experiments conducted on the synthetic data and real recordings confirm the effectiveness of the proposed method in noisy reverberant environments.

摘要

本文提出了一种用于多通道语音增强的混合神经波束形成器,它由三个阶段组成,即波束形成、后滤波和失真补偿,称为 TriU-Net。TriU-Net 首先估计一组用于最小方差无失真响应波束形成器中的掩模。然后利用基于深度神经网络 (DNN) 的后滤波器来抑制残余噪声。最后,跟随一个基于 DNN 的失真补偿器来进一步提高语音质量。为了更有效地描述长程时间依赖性,提出并在 TriU-Net 中使用了一种网络拓扑结构,门控卷积注意网络。所提出模型的优点在于明确考虑了语音失真补偿,从而获得更高的语音质量和可懂度。在所提出的模型在 CHiME-3 数据集上实现了平均 2.854 wb-PESQ 得分和 92.57%的 ESTOI。此外,在合成数据和真实录音上进行的广泛实验证实了该方法在噪声混响环境中的有效性。

相似文献

1
Three-stage hybrid neural beamformer for multi-channel speech enhancement.三阶段混合神经波束形成器用于多通道语音增强。
J Acoust Soc Am. 2023 Jun 1;153(6):3378. doi: 10.1121/10.0019802.
2
Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments.基于深度神经网络的混响消除和波束形成在多通道环境下的声音事件检测的联合优化。
Sensors (Basel). 2020 Mar 28;20(7):1883. doi: 10.3390/s20071883.
3
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
4
Real-time dual-channel speech enhancement by VAD assisted MVDR beamformer for hearing aid applications using smartphone.使用智能手机的助听器应用中,基于语音活动检测辅助的最小方差无失真响应波束形成器的实时双通道语音增强
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:952-955. doi: 10.1109/EMBC44109.2020.9175212.
5
Improved Speech Spatial Covariance Matrix Estimation for Online Multi-Microphone Speech Enhancement.在线多麦克风语音增强的改进语音空间协方差矩阵估计。
Sensors (Basel). 2022 Dec 22;23(1):111. doi: 10.3390/s23010111.
6
Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition.基于深度神经网络的双通道远场语音识别广义旁瓣抵消器
Neural Netw. 2021 Sep;141:225-237. doi: 10.1016/j.neunet.2021.04.017. Epub 2021 Apr 19.
7
Real-time single-channel deep neural network-based speech enhancement on edge devices.基于实时单通道深度神经网络的边缘设备语音增强技术。
Interspeech. 2020 Oct;2020:3281-3285. doi: 10.21437/Interspeech.2020-1901.
8
End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement.端到端深度卷积递归模型在抗噪波形语音增强中的应用。
Sensors (Basel). 2022 Oct 13;22(20):7782. doi: 10.3390/s22207782.
9
Deep Learning Based Binaural Speech Separation in Reverberant Environments.基于深度学习的混响环境下双耳语音分离
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.
10
Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement.采用波束形成与掩蔽信息语音增强相结合的助听器处理方案进行语音识别。
Trends Hear. 2022 Jan-Dec;26:23312165211068629. doi: 10.1177/23312165211068629.

引用本文的文献

1
Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications.基于有效声学模型的波束成形训练,用于静态和动态 HRi 应用。
Sensors (Basel). 2024 Oct 15;24(20):6644. doi: 10.3390/s24206644.