Suppr超能文献

基于深度学习的人工耳蜗语音增强:一种权衡语音失真与噪声残留的损失函数

Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants.

作者信息

Kang Yuyong, Zheng Nengheng, Meng Qinglin

机构信息

Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China.

Pengcheng Laboratory, Shenzhen, China.

出版信息

Front Med (Lausanne). 2021 Nov 8;8:740123. doi: 10.3389/fmed.2021.740123. eCollection 2021.

Abstract

The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI devices achieve good speech perception in quiet backgrounds, they usually suffer from poor speech perception in noisy environments. Therefore, noise suppression or speech enhancement (SE) is one of the most important technologies for CI. In this study, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE front ends to CI, and discuss how the hearing properties of the CI recipients could be utilized to optimize the DL-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments is presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge in CI research. Furthermore, speech reception threshold (SRT) in noise tests demonstrates that the intelligibility of the denoised speech can be significantly improved when the NN is trained with a loss function bias to more noise suppression than that with equal attention on noise residue and speech distortion.

摘要

耳蜗在将声振动转化为神经刺激的过程中起着关键作用,大脑正是基于这种神经刺激来感知声音的。人工耳蜗(CI)是一种听觉假体,用于替代受损的耳蜗毛细胞,以实现声电转换。然而,人工耳蜗对正常耳蜗的仿生模仿程度很低。正常耳蜗所传递的高分辨率时频强度信息,对于诸如在具有挑战性的环境中进行语音感知等高质量听觉感知至关重要,但人工耳蜗无法保证传递这些信息。尽管使用最先进的商用人工耳蜗设备的使用者在安静背景下能实现良好的语音感知,但他们在嘈杂环境中通常语音感知较差。因此,噪声抑制或语音增强(SE)是人工耳蜗最重要的技术之一。在本研究中,我们介绍了深度学习(DL)的最新进展,主要是基于神经网络(NN)的语音增强前端应用于人工耳蜗,并讨论了如何利用人工耳蜗使用者的听力特性来优化基于深度学习的语音增强。特别是,引入了不同的损失函数来监督神经网络训练,并展示了一组客观和主观实验。结果证实,人工耳蜗使用者对残余噪声比对语音增强引起的语音失真更敏感,这在人工耳蜗研究中是常识。此外,噪声环境下的语音接受阈值(SRT)测试表明,当神经网络使用更倾向于噪声抑制的损失函数进行训练时,去噪语音的可懂度比同等关注噪声残余和语音失真时能显著提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb6/8606413/65d5dc5dab93/fmed-08-740123-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验