• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.利用深度神经网络估计非负矩阵模型激活以提高感知语音质量。
J Acoust Soc Am. 2015 Sep;138(3):1399-407. doi: 10.1121/1.4928612.
2
Perceptual effects of noise reduction by time-frequency masking of noisy speech.噪声语音的时频掩蔽降噪的感知效果。
J Acoust Soc Am. 2012 Oct;132(4):2690-9. doi: 10.1121/1.4747006.
3
Sparse Nonnegative Matrix Factorization Strategy for Cochlear Implants.用于人工耳蜗的稀疏非负矩阵分解策略
Trends Hear. 2015 Dec 30;19:2331216515616941. doi: 10.1177/2331216515616941.
4
An ideal quantized mask to increase intelligibility and quality of speech in noise.一种理想的量化掩蔽,可提高噪声中的语音可懂度和质量。
J Acoust Soc Am. 2018 Sep;144(3):1392. doi: 10.1121/1.5053115.
5
Reconstruction techniques for improving the perceptual quality of binary masked speech.用于提高二进制掩码语音感知质量的重构技术。
J Acoust Soc Am. 2014 Aug;136(2):892-902. doi: 10.1121/1.4884759.
6
Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold.混响环境下理想二值掩蔽对言语可懂度的影响:早期反射声和信噪比阈的作用。
J Acoust Soc Am. 2013 Mar;133(3):1707-17. doi: 10.1121/1.4789895.
7
Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.添加背景噪声可提高理想二值掩蔽噪声语音的可懂度。
J Acoust Soc Am. 2011 Apr;129(4):2227-36. doi: 10.1121/1.3559707.
8
Spectro-temporal modulation energy based mask for robust speaker identification.基于谱时调制能量的掩蔽稳健说话人识别。
J Acoust Soc Am. 2012 May;131(5):EL368-74. doi: 10.1121/1.3697534.
9
The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio.基于信噪比的语音分离最优时频掩蔽比。
J Acoust Soc Am. 2013 Nov;134(5):EL452-8. doi: 10.1121/1.4824632.
10
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.基于调制频率选择性处理后的信噪比包络功率比预测语音可懂度。
J Acoust Soc Am. 2011 Sep;130(3):1475-87. doi: 10.1121/1.3621502.

引用本文的文献

1
An ideal quantized mask to increase intelligibility and quality of speech in noise.一种理想的量化掩蔽,可提高噪声中的语音可懂度和质量。
J Acoust Soc Am. 2018 Sep;144(3):1392. doi: 10.1121/1.5053115.
2
Impact of phase estimation on single-channel speech separation based on time-frequency masking.相位估计对基于时频掩蔽的单通道语音分离的影响。
J Acoust Soc Am. 2017 Jun;141(6):4668. doi: 10.1121/1.4986647.
3
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.

本文引用的文献

1
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
2
Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners.理想的时频掩蔽算法在正常听力和人工耳蜗听众中会导致不同的言语可懂度和质量。
IEEE Trans Biomed Eng. 2015 Jan;62(1):331-41. doi: 10.1109/TBME.2014.2351854. Epub 2014 Aug 26.
3
Reconstruction techniques for improving the perceptual quality of binary masked speech.用于提高二进制掩码语音感知质量的重构技术。
J Acoust Soc Am. 2014 Aug;136(2):892-902. doi: 10.1121/1.4884759.
4
An algorithm to improve speech recognition in noise for hearing-impaired listeners.一种用于改善听力障碍者在噪声环境下语音识别的算法。
J Acoust Soc Am. 2013 Oct;134(4):3029-38. doi: 10.1121/1.4820893.
5
An algorithm that improves speech intelligibility in noise for normal-hearing listeners.一种可提高听力正常的听众在噪声环境中语音清晰度的算法。
J Acoust Soc Am. 2009 Sep;126(3):1486-94. doi: 10.1121/1.3184603.
6
Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis.基于伊塔库拉-斋藤散度的非负矩阵分解:及其在音乐分析中的应用
Neural Comput. 2009 Mar;21(3):793-830. doi: 10.1162/neco.2008.04-08-771.
7
Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners.噪声和失真对正常听力和听力受损听众语音质量判断的影响。
J Acoust Soc Am. 2007 Aug;122(2):1150-64. doi: 10.1121/1.2754061.
8
Learning the parts of objects by non-negative matrix factorization.通过非负矩阵分解学习物体的各个部分。
Nature. 1999 Oct 21;401(6755):788-91. doi: 10.1038/44565.

利用深度神经网络估计非负矩阵模型激活以提高感知语音质量。

Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

作者信息

Williamson Donald S, Wang Yuxuan, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2015 Sep;138(3):1399-407. doi: 10.1121/1.4928612.

DOI:10.1121/1.4928612
PMID:26428778
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5392055/
Abstract

As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. An ideal ratio mask is estimated, which separates speech from noise with reasonable sound quality. A deep neural network then approximates clean speech by estimating activation weights from the ratio-masked speech, where the weights linearly combine elements from a NMF speech model. Systematic comparisons using objective metrics, including the perceptual evaluation of speech quality, show that the proposed algorithm achieves higher speech quality than related masking and NMF methods. In addition, a listening test was performed and its results show that the output of the proposed algorithm is preferred over the comparison systems in terms of speech quality.

摘要

作为一种语音分离方法,时频掩蔽将增益函数应用于带噪语音的时频表示。另一方面,非负矩阵分解(NMF)通过线性组合语音模型和噪声模型的基向量来近似带噪语音,从而解决分离问题。本文提出了一种在低信噪比情况下提高从背景噪声中分离出的语音感知质量的方法。估计一个理想的比率掩蔽,它以合理的音质将语音与噪声分离。然后,深度神经网络通过从比率掩蔽语音中估计激活权重来近似纯净语音,其中权重线性组合来自非负矩阵分解语音模型的元素。使用包括语音质量感知评估在内的客观指标进行的系统比较表明,所提出的算法比相关的掩蔽和非负矩阵分解方法具有更高的语音质量。此外,进行了听力测试,其结果表明,在所提出算法的输出在语音质量方面比比较系统更受青睐。