Department of Otolaryngology, Medical University Hannover and Cluster of Excellence Hearing4all, Hannover, 30625, Germany.
J Acoust Soc Am. 2018 Jun;143(6):3602. doi: 10.1121/1.5042056.
The severe hearing loss problems that some people suffer can be treated by providing them with a surgically implanted electrical device called cochlear implant (CI). CI users struggle to perceive complex audio signals such as music; however, previous studies show that CI recipients find music more enjoyable when the vocals are enhanced with respect to the background music. In this manuscript source separation (SS) algorithms are used to remix pop songs by applying gain to the lead singing voice. This work uses deep convolutional auto-encoders, a deep recurrent neural network, a multilayer perceptron (MLP), and non-negative matrix factorization to be evaluated objectively and subjectively through two different perceptual experiments which involve normal hearing subjects and CI recipients. The evaluation assesses the relevance of the artifacts introduced by the SS algorithms considering their computation time, as this study aims at proposing one of the algorithms for real-time implementation. Results show that the MLP performs in a robust way throughout the tested data while providing levels of distortions and artifacts which are not perceived by CI users. Thus, an MLP is proposed to be implemented for real-time monaural audio SS to remix music for CI users.
一些人患有严重的听力损失问题,可以通过为他们提供一种名为人工耳蜗(CI)的手术植入式电子设备来治疗。CI 用户难以感知复杂的音频信号,如音乐;然而,先前的研究表明,当背景音乐相对于背景音乐增强人声时,CI 接受者会发现音乐更令人愉悦。在本文中,源分离(SS)算法被用于通过对主唱声音施加增益来重新混合流行歌曲。这项工作使用深度卷积自动编码器、深度递归神经网络、多层感知器(MLP)和非负矩阵分解,通过两个涉及正常听力受试者和 CI 接受者的不同感知实验进行客观和主观的评估。评估考虑到 SS 算法的计算时间,评估了由 SS 算法引入的伪像的相关性,因为本研究旨在提出一种用于实时实现的算法。结果表明,MLP 在整个测试数据中表现稳健,同时提供了 CI 用户无法感知的失真和伪像水平。因此,提出了一种 MLP 用于实时单声道音频 SS,以为 CI 用户重新混合音乐。