Lai Ying-Hui, Tsao Yu, Lu Xugang, Chen Fei, Su Yu-Ting, Chen Kuang-Chao, Chen Yu-Hsuan, Chen Li-Ching, Po-Hung Li Lieber, Lee Chin-Hui
Department of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan.
Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.
Ear Hear. 2018 Jul/Aug;39(4):795-809. doi: 10.1097/AUD.0000000000000537.
We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients.
The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing.
The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions.
When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion for the key speech envelope information, thus, improving speech recognition more effectively for Mandarin CI recipients. The results suggest that the proposed deep learning-based NR approach can potentially be integrated into existing CI signal processors to overcome the degradation of speech perception caused by noise.
我们研究了一种基于深度学习的新型降噪(NR)方法在噪声环境下对说普通话的人工耳蜗(CI)植入者的临床效果,该噪声环境具有挑战性的噪声类型且信噪比(SNR)较低。
本研究中使用的基于深度学习的NR方法由两个模块组成:噪声分类器(NC)和深度去噪自动编码器(DDAE),因此称为(NC + DDAE)。在一系列综合实验中,我们对NC模块和整体NC + DDAE方法进行了定性和定量分析。此外,我们评估了NC + DDAE NR和经典单麦克风NR方法在不同噪声条件下对说普通话的CI植入者的语音识别性能。测试集包含在0和5 dB SNR水平下被两种类型的掩蔽噪声、双说话者嘈杂噪声和建筑风镐噪声破坏的普通话句子。使用两种传统的NR技术和所提出的基于深度学习的方法来处理有噪声的话语。我们通过处理后的话语的幅度包络和频谱图定性比较NR方法。定量客观指标包括:(1)归一化协方差度量,以测试每种NR方法处理的话语的可懂度;(2)由九名说普通话的CI植入者进行的语音识别测试。这九名CI植入者在测试期间使用他们自己的临床语音处理器。
客观评估和听力测试的实验结果表明,在具有挑战性的听力条件下,所提出的NC + DDAE NR方法在匹配和不匹配的训练 - 测试条件下都比两种比较的经典NR技术产生更高的可懂度分数。
与在具有挑战性的听力条件下的两种知名传统NR技术相比,所提出的NC + DDAE NR方法具有卓越的噪声抑制能力,并且对关键语音包络信息的失真更小,因此,能更有效地提高说普通话的CI植入者的语音识别能力。结果表明,所提出的基于深度学习的NR方法有可能集成到现有的CI信号处理器中,以克服噪声引起的语音感知退化。