• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相位估计对基于时频掩蔽的单通道语音分离的影响。

Impact of phase estimation on single-channel speech separation based on time-frequency masking.

作者信息

Mayer Florian, Williamson Donald S, Mowlaee Pejman, Wang DeLiang

机构信息

FH Joanneum - University of Applied Sciences, Graz, Austria.

Department of Computer Science, Indiana University, Bloomington, Indiana 47405, USA.

出版信息

J Acoust Soc Am. 2017 Jun;141(6):4668. doi: 10.1121/1.4986647.

DOI:10.1121/1.4986647
PMID:28679243
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6909979/
Abstract

Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-to-noise ratio and noise scenarios.

摘要

时频掩蔽是单通道源分离(SCSS)问题的一种常见解决方案,其目标是找到一个时频掩蔽,将潜在源从观测到的混合信号中分离出来。然后将估计出的掩蔽应用于混合信号,以提取所需信号。在信号重建过程中,时频掩蔽的频谱幅度与混合信号的相位相结合。本文考虑了使用基于传统模型的方法,用估计出的纯净频谱相位与估计出的幅度谱相结合来替代混合频谱相位的影响。由于所提出的相位估计器需要从混合信号中估计出潜在信号的基频,因此提出了一种鲁棒的基音估计器。纯净相位的上限结果显示了相位感知处理在单通道源分离中的潜力。此外,实验表明,在所有信噪比和噪声场景下,用估计出的纯净频谱相位替代混合信号相位均能持续提高感知语音质量、预测语音可懂度和源分离性能。

相似文献

1
Impact of phase estimation on single-channel speech separation based on time-frequency masking.相位估计对基于时频掩蔽的单通道语音分离的影响。
J Acoust Soc Am. 2017 Jun;141(6):4668. doi: 10.1121/1.4986647.
2
The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio.基于信噪比的语音分离最优时频掩蔽比。
J Acoust Soc Am. 2013 Nov;134(5):EL452-8. doi: 10.1121/1.4824632.
3
Long short-term memory for speaker generalization in supervised speech separation.用于监督语音分离中说话人泛化的长短期记忆网络
J Acoust Soc Am. 2017 Jun;141(6):4705. doi: 10.1121/1.4986931.
4
Decision-directed speech power spectral density matrix estimation for multichannel speech enhancement.用于多通道语音增强的决策导向语音功率谱密度矩阵估计
J Acoust Soc Am. 2017 Mar;141(3):EL228. doi: 10.1121/1.4977098.
5
The role of binary mask patterns in automatic speech recognition in background noise.二进制掩模模式在背景噪声中的自动语音识别中的作用。
J Acoust Soc Am. 2013 May;133(5):3083-93. doi: 10.1121/1.4798661.
6
Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold.混响环境下理想二值掩蔽对言语可懂度的影响:早期反射声和信噪比阈的作用。
J Acoust Soc Am. 2013 Mar;133(3):1707-17. doi: 10.1121/1.4789895.
7
Reconstruction techniques for improving the perceptual quality of binary masked speech.用于提高二进制掩码语音感知质量的重构技术。
J Acoust Soc Am. 2014 Aug;136(2):892-902. doi: 10.1121/1.4884759.
8
Perceptual effects of noise reduction by time-frequency masking of noisy speech.噪声语音的时频掩蔽降噪的感知效果。
J Acoust Soc Am. 2012 Oct;132(4):2690-9. doi: 10.1121/1.4747006.
9
Sentence intelligibility during segmental interruption and masking by speech-modulated noise: Effects of age and hearing loss.语音调制噪声分段干扰和掩蔽期间的句子可懂度:年龄和听力损失的影响。
J Acoust Soc Am. 2015 Jun;137(6):3487-501. doi: 10.1121/1.4921603.
10
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.基于调制频率选择性处理后的信噪比包络功率比预测语音可懂度。
J Acoust Soc Am. 2011 Sep;130(3):1475-87. doi: 10.1121/1.3621502.

引用本文的文献

1
Impact of Mask Type as Training Target for Speech Intelligibility and Quality in Cochlear-Implant Noise Reduction.口罩类型对人工耳蜗降噪言语可懂度和质量训练目标的影响。
Sensors (Basel). 2024 Oct 14;24(20):6614. doi: 10.3390/s24206614.

本文引用的文献

1
Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.利用深度神经网络估计非负矩阵模型激活以提高感知语音质量。
J Acoust Soc Am. 2015 Sep;138(3):1399-407. doi: 10.1121/1.4928612.
2
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
3
Reconstruction techniques for improving the perceptual quality of binary masked speech.用于提高二进制掩码语音感知质量的重构技术。
J Acoust Soc Am. 2014 Aug;136(2):892-902. doi: 10.1121/1.4884759.
4
An algorithm to improve speech recognition in noise for hearing-impaired listeners.一种用于改善听力障碍者在噪声环境下语音识别的算法。
J Acoust Soc Am. 2013 Oct;134(4):3029-38. doi: 10.1121/1.4820893.
5
An algorithm that improves speech intelligibility in noise for normal-hearing listeners.一种可提高听力正常的听众在噪声环境中语音清晰度的算法。
J Acoust Soc Am. 2009 Sep;126(3):1486-94. doi: 10.1121/1.3184603.