利用深度神经网络估计非负矩阵模型激活以提高感知语音质量。

Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

作者信息

Williamson Donald S, Wang Yuxuan, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2015 Sep;138(3):1399-407. doi: 10.1121/1.4928612.

DOI:10.1121/1.4928612

PMID:26428778

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5392055/

Abstract

As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. An ideal ratio mask is estimated, which separates speech from noise with reasonable sound quality. A deep neural network then approximates clean speech by estimating activation weights from the ratio-masked speech, where the weights linearly combine elements from a NMF speech model. Systematic comparisons using objective metrics, including the perceptual evaluation of speech quality, show that the proposed algorithm achieves higher speech quality than related masking and NMF methods. In addition, a listening test was performed and its results show that the output of the proposed algorithm is preferred over the comparison systems in terms of speech quality.

摘要

作为一种语音分离方法，时频掩蔽将增益函数应用于带噪语音的时频表示。另一方面，非负矩阵分解（NMF）通过线性组合语音模型和噪声模型的基向量来近似带噪语音，从而解决分离问题。本文提出了一种在低信噪比情况下提高从背景噪声中分离出的语音感知质量的方法。估计一个理想的比率掩蔽，它以合理的音质将语音与噪声分离。然后，深度神经网络通过从比率掩蔽语音中估计激活权重来近似纯净语音，其中权重线性组合来自非负矩阵分解语音模型的元素。使用包括语音质量感知评估在内的客观指标进行的系统比较表明，所提出的算法比相关的掩蔽和非负矩阵分解方法具有更高的语音质量。此外，进行了听力测试，其结果表明，在所提出算法的输出在语音质量方面比比较系统更受青睐。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用深度神经网络估计非负矩阵模型激活以提高感知语音质量。

Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

利用深度神经网络估计非负矩阵模型激活以提高感知语音质量。

Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

作者信息

机构信息

出版信息