Chu Kevin, Collins Leslie, Mainsah Boyla
Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA.
Proc IEEE Int Conf Acoust Speech Signal Process. 2020 May;2020:6929-6933. doi: 10.1109/icassp40776.2020.9054450. Epub 2020 May 14.
Cochlear implant (CI) users experience substantial difficulties in understanding reverberant speech. A previous study proposed a strategy that leverages automatic speech recognition (ASR) to recognize reverberant speech and speech synthesis to translate the recognized text into anechoic speech. However, the strategy was trained and tested on the same reverberant environment, so it is unknown whether the strategy is robust to unseen environments. Thus, the current study investigated the performance of the previously proposed algorithm in multiple unseen environments. First, an ASR system was trained on anechoic and reverberant speech using different room types. Next, a speech synthesizer was trained to generate speech from the text predicted by the ASR system. Experiments were conducted in normal hearing listeners using vocoded speech, and the results showed that the strategy improved speech intelligibility in previously unseen conditions. These results suggest that the ASR-synthesis strategy can potentially benefit CI users in everyday reverberant environments.
人工耳蜗(CI)使用者在理解混响语音方面存在很大困难。先前的一项研究提出了一种策略,该策略利用自动语音识别(ASR)来识别混响语音,并利用语音合成将识别出的文本转换为无回声语音。然而,该策略是在相同的混响环境中进行训练和测试的,因此尚不清楚该策略对未见过的环境是否具有鲁棒性。因此,当前的研究调查了先前提出的算法在多个未见过的环境中的性能。首先,使用不同的房间类型在无回声和混响语音上训练一个ASR系统。接下来,训练一个语音合成器,根据ASR系统预测的文本生成语音。使用声码语音在正常听力的听众中进行了实验,结果表明该策略在以前未见过的条件下提高了语音清晰度。这些结果表明,ASR合成策略可能会使人工耳蜗使用者在日常混响环境中受益。