Mamun Nursadul, Khorram Soheil, Hansen John H L
Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical & Computer Engineering, The University of Texas at Dallas.
Interspeech. 2019 Sep;2019:4265-4269. doi: 10.21437/interspeech.2019-1850.
Attempts to develop speech enhancement algorithms with improved speech intelligibility for cochlear implant (CI) users have met with limited success. To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a cochlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli. We leverage a convolutional neural network (CNN) to extract both stationary and non-stationary components of environmental acoustics and speech. We propose three CNN architectures: (1) vanilla CNN that directly generates the enhanced signal; (2) spectral-subtraction-style CNN (SS-CNN) that first predicts noise and then generates the enhanced signal by subtracting noise from the noisy signal; (3) Wiener-style CNN (Wiener-CNN) that generates an optimal mask for suppressing noise. An important problem of the proposed networks is that they introduce considerable delays, which limits their real-time application for CI users. To address this, this study also considers causal variations of these networks. Our experiments show that the proposed networks (both causal and non-causal forms) achieve significant improvement over existing baseline systems. We also found that causal Wiener-CNN outperforms other networks, and leads to the best overall envelope coefficient measure (ECM). The proposed algorithms represent a viable option for implementation on the CCi-MOBILE research platform as a pre-processor for CI users in naturalistic environments.
为提高人工耳蜗(CI)使用者的语音清晰度而开发语音增强算法的尝试取得的成功有限。为改进针对CI使用者的语音增强方法,我们建议在耳蜗滤波器组特征空间中进行语音增强,该特征集是基于CI听觉刺激专门为CI使用者设计的。我们利用卷积神经网络(CNN)来提取环境声学和语音的平稳和非平稳成分。我们提出了三种CNN架构:(1)直接生成增强信号的普通CNN;(2)先预测噪声然后通过从噪声信号中减去噪声来生成增强信号的谱减法风格CNN(SS-CNN);(3)生成用于抑制噪声的最优掩码的维纳风格CNN(Wiener-CNN)。所提出网络的一个重要问题是它们会引入相当大的延迟,这限制了它们在CI使用者中的实时应用。为解决这个问题,本研究还考虑了这些网络的因果变体。我们的实验表明,所提出的网络(因果和非因果形式)相对于现有的基线系统都有显著改进。我们还发现因果维纳CNN优于其他网络,并导致最佳的总体包络系数度量(ECM)。所提出的算法是在CCi-MOBILE研究平台上作为自然环境中CI使用者的预处理器进行实现的一个可行选择。