Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA.
Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA.
J Acoust Soc Am. 2019 Mar;145(3):1378. doi: 10.1121/1.5093547.
For deep learning based speech segregation to have translational significance as a noise-reduction tool, it must perform in a wide variety of acoustic environments. In the current study, performance was examined when target speech was subjected to interference from a single talker and room reverberation. Conditions were compared in which an algorithm was trained to remove both reverberation and interfering speech, or only interfering speech. A recurrent neural network incorporating bidirectional long short-term memory was trained to estimate the ideal ratio mask corresponding to target speech. Substantial intelligibility improvements were found for hearing-impaired (HI) and normal-hearing (NH) listeners across a range of target-to-interferer ratios (TIRs). HI listeners performed better with reverberation removed, whereas NH listeners demonstrated no difference. Algorithm benefit averaged 56 percentage points for the HI listeners at the least-favorable TIR, allowing these listeners to perform numerically better than young NH listeners without processing. The current study highlights the difficulty associated with perceiving speech in reverberant-noisy environments, and it extends the range of environments in which deep learning based speech segregation can be effectively applied. This increasingly wide array of environments includes not only a variety of background noises and interfering speech, but also room reverberation.
为了使基于深度学习的语音分离作为降噪工具具有可移植性,它必须在各种声学环境中表现良好。在本研究中,当目标语音受到单个说话者和房间混响的干扰时,检查了其性能。比较了在对混响和干扰语音进行训练的算法和仅对干扰语音进行训练的算法的性能。使用双向长短时记忆递归神经网络训练算法来估计与目标语音对应的理想比掩蔽。在一系列目标干扰比(TIR)范围内,听力受损(HI)和正常听力(NH)听众的可理解度都有了显著提高。与去除混响的情况相比,HI 听众的表现更好,而 NH 听众则没有差异。对于最不利的 TIR,算法的平均受益为 HI 听众带来了 56 个百分点,这使得这些听众在没有处理的情况下,其表现数值优于年轻的 NH 听众。本研究强调了在混响噪声环境中感知语音的困难,并扩展了深度学习语音分离可以有效应用的环境范围。这个环境范围越来越广泛,不仅包括各种背景噪声和干扰语音,还包括房间混响。