• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于逐话语和连续语音分离的多麦克风复谱映射

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.

作者信息

Wang Zhong-Qiu, Wang Peidong, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210-1277 USA, while performing this work. He is now with Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, USA.

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210-1277 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.

DOI:10.1109/taslp.2021.3083405
PMID:34212067
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8240467/
Abstract

We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.

摘要

我们提出了多麦克风复谱映射方法,这是一种将深度学习应用于时变非线性波束形成的简单方法,用于在混响环境中进行说话人分离。我们旨在实现说话人分离和去混响。我们的研究首先研究离线逐话语说话人分离,然后扩展到块在线连续语音分离(CSS)。假设训练和测试之间的阵列几何形状固定,我们训练深度神经网络(DNN),根据多个麦克风的实部和虚部(RI)分量来预测参考麦克风处目标语音的实部和虚部(RI)分量。然后,我们将多麦克风复谱映射与最小方差无失真响应(MVDR)波束形成和后置滤波相结合,以进一步提高分离效果,并将其与帧级说话人计数相结合用于块在线CSS。尽管我们的系统是基于以给定几何形状排列的固定数量的麦克风在模拟房间脉冲响应(RIR)上进行训练的,但它能很好地推广到具有相同几何形状的真实阵列。在模拟的双说话人SMS-WSJ语料库和真实录制的LibriCSS数据集上获得了当前最优的分离性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/20560553039b/nihms-1715465-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/344d06586268/nihms-1715465-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/c98bf3ddeb88/nihms-1715465-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/f79e56f1d5c9/nihms-1715465-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/5d2a54267aea/nihms-1715465-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/b20d16d120f8/nihms-1715465-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/20560553039b/nihms-1715465-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/344d06586268/nihms-1715465-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/c98bf3ddeb88/nihms-1715465-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/f79e56f1d5c9/nihms-1715465-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/5d2a54267aea/nihms-1715465-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/b20d16d120f8/nihms-1715465-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d7/8240467/20560553039b/nihms-1715465-f0006.jpg

相似文献

1
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.用于逐话语和连续语音分离的多麦克风复谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.
2
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
3
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
4
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.基于深度学习的说话人相关混响语音分离实证研究
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
5
A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.一种用于混响条件下说话人无关说话人分离的两阶段深度学习算法。
J Acoust Soc Am. 2020 Sep;148(3):1157. doi: 10.1121/10.0001779.
6
A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation.一种具有多域学习的双流深度吸引子网络用于语音去混响和分离。
Neural Netw. 2021 Sep;141:238-248. doi: 10.1016/j.neunet.2021.04.023. Epub 2021 Apr 21.
7
Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments.基于深度神经网络的混响消除和波束形成在多通道环境下的声音事件检测的联合优化。
Sensors (Basel). 2020 Mar 28;20(7):1883. doi: 10.3390/s20071883.
8
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.用于单声道独立说话人分离的因果深度CASA
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.
9
Deep Learning Based Binaural Speech Separation in Reverberant Environments.基于深度学习的混响环境下双耳语音分离
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.
10
A Real-Time Speech Separation Method Based on Camera and Microphone Array Sensors Fusion Approach.基于相机和麦克风阵列传感器融合方法的实时语音分离方法。
Sensors (Basel). 2020 Jun 22;20(12):3527. doi: 10.3390/s20123527.

引用本文的文献

1
Brain-Controlled Augmented Hearing for Spatially Moving Conversations in Multi-Talker Environments.脑控增强听觉:在多说话人环境中对空间移动对话的增强
Adv Sci (Weinh). 2024 Nov;11(41):e2401379. doi: 10.1002/advs.202401379. Epub 2024 Sep 9.

本文引用的文献

1
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
2
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
3
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.
分而治之:一种用于独立于说话者的单声道语音分离的深度CASA方法。
IEEE/ACM Trans Audio Speech Lang Process. 2019;27(12):2092-2102. doi: 10.1109/taslp.2019.2941148. Epub 2019 Sep 12.
4
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.卷积时域音频分离网络(Conv-TasNet):超越理想时频幅度掩蔽的语音分离方法
IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug;27(8):1256-1266. doi: 10.1109/TASLP.2019.2915167. Epub 2019 May 6.
5
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
6
DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER.用于音乐分离的深度聚类与传统网络:携手共进,力量更强。
Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:61-65. doi: 10.1109/ICASSP.2017.7952118. Epub 2017 Jun 19.
7
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.