Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom.
Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom.
J Acoust Soc Am. 2019 Jul;146(1):705. doi: 10.1121/1.5119226.
Speech-in-noise perception is a major problem for users of cochlear implants (CIs), especially with non-stationary background noise. Noise-reduction algorithms have produced benefits but relied on a priori information about the target speaker and/or background noise. A recurrent neural network (RNN) algorithm was developed for enhancing speech in non-stationary noise and its benefits were evaluated for speech perception, using both objective measures and experiments with CI simulations and CI users. The RNN was trained using speech from many talkers mixed with multi-talker or traffic noise recordings. Its performance was evaluated using speech from an unseen talker mixed with different noise recordings of the same class, either babble or traffic noise. Objective measures indicated benefits of using a recurrent over a feed-forward architecture, and predicted better speech intelligibility with than without the processing. The experimental results showed significantly improved intelligibility of speech in babble noise but not in traffic noise. CI subjects rated the processed stimuli as significantly better in terms of speech distortions, noise intrusiveness, and overall quality than unprocessed stimuli for both babble and traffic noise. These results extend previous findings for CI users to mostly unseen acoustic conditions with non-stationary noise.
噪声环境下言语感知是人工耳蜗使用者面临的一个主要问题,尤其是在非平稳背景噪声下。降噪算法已取得了一定的成效,但仍依赖于关于目标说话人以及/或背景噪声的先验信息。本文提出了一种用于增强非平稳噪声下言语感知的递归神经网络算法,并通过 CI 仿真和 CI 用户实验,从客观评估和主观评估两个方面对其性能进行了评估。该 RNN 算法通过将来自多位说话人的语音与多人语音或交通噪声录音相混合进行训练。它的性能是通过使用来自未知说话人的语音与相同类型的不同噪声录音(例如环境噪声或交通噪声)相混合进行评估的。客观评估结果表明,相较于前馈架构,递归架构具有一定优势,并且经过处理后的语音的可懂度明显高于未经处理的语音。实验结果表明,在环境噪声下,该算法可显著提高语音的可懂度,但在交通噪声下效果不明显。CI 受试者认为,相较于未经处理的刺激,经过处理的刺激在语音失真、噪声干扰和整体质量方面的评分明显更高,无论是环境噪声还是交通噪声下均是如此。这些结果将之前针对 CI 用户的研究结果扩展到了具有非平稳噪声的大部分未听过的声学条件下。