Department of Otolaryngology, New York University Grossman School of Medicine, 550 First Avenue, New York, NY, 10016, USA.
Sci Rep. 2024 Oct 14;14(1):24056. doi: 10.1038/s41598-024-73173-6.
Acoustic vocoders play a key role in simulating the speech information available to cochlear implant (CI) users. Traditionally, the intelligibility of vocoder CI simulations is assessed through speech recognition experiments with normally-hearing subjects, a process that can be time-consuming, costly, and subject to individual variability. As an alternative approach, we utilized an advanced deep learning speech recognition model to investigate the intelligibility of CI simulations. We evaluated model's performance on vocoder-processed words and sentences with varying vocoder parameters. The number of vocoder bands, frequency range, and envelope dynamic range were adjusted to simulate sound processing settings in CI devices. Additionally, we manipulated the low-cutoff frequency and intensity quantization of vocoder envelopes to simulate psychophysical temporal and intensity resolutions in CI patients. The results were evaluated within the context of the audio analysis performed in the model. Interestingly, the deep learning model, despite not being originally designed to mimic human speech processing, exhibited a human-like response to alterations in vocoder parameters, resembling existing human subject results. This approach offers significant time and cost savings compared to testing human subjects, and eliminates learning and fatigue effects during testing. Our findings demonstrate the potential of speech recognition models in facilitating auditory research.
声码器在模拟人工耳蜗(CI)用户可用的语音信息方面发挥着关键作用。传统上,通过正常听力受试者的语音识别实验评估声码器 CI 模拟的可懂度,这一过程既耗时又昂贵,并且存在个体差异。作为一种替代方法,我们利用先进的深度学习语音识别模型来研究 CI 模拟的可懂度。我们评估了模型在具有不同声码器参数的声码器处理的单词和句子上的性能。声码器的带数量、频率范围和包络动态范围进行了调整,以模拟 CI 设备中的声音处理设置。此外,我们还操纵了声码器包络的低截止频率和强度量化,以模拟 CI 患者的心理物理时间和强度分辨率。结果在模型中进行的音频分析的背景下进行了评估。有趣的是,尽管深度学习模型最初并非设计用于模拟人类语音处理,但它对声码器参数的变化表现出类似于人类的反应,类似于现有的人类受试者结果。与测试人类受试者相比,这种方法具有显著的时间和成本节省,并且在测试过程中消除了学习和疲劳效应。我们的研究结果表明,语音识别模型在促进听觉研究方面具有潜力。