Sun Wenhui, Zou Jiajie, Zhu Tianyi, Sun Zhoujian, Ding Nai
Research Center for Life Sciences Computing, Zhejiang Lab, Hangzhou 311121, China.
Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China.
iScience. 2024 May 22;27(6):110055. doi: 10.1016/j.isci.2024.110055. eCollection 2024 Jun 21.
Humans can quickly adapt to recognize acoustically degraded speech, and here we hypothesize that the quick adaptation is enabled by internal linguistic feedback - Listeners use partially recognized sentences to adapt the mapping between acoustic features and phonetic labels. We test this hypothesis by quantifying how quickly humans adapt to degraded speech and analyzing whether the adaptation process can be simulated by adapting an automatic speech recognition (ASR) system based on its own speech recognition results. We consider three types of acoustic degradation, i.e., noise vocoding, time compression, and local time-reversal. The human speech recognition rate can increase by >20% after exposure to just a few acoustically degraded sentences. Critically, the ASR system with internal linguistic feedback can adapt to degraded speech with human-level speed and accuracy. These results suggest that self-supervised learning based on linguistic feedback is a plausible strategy for human adaptation to acoustically degraded speech.
人类能够迅速适应识别声学上退化的语音,在此我们假设这种快速适应是由内部语言反馈实现的——听者使用部分识别的句子来调整声学特征与语音标签之间的映射。我们通过量化人类适应退化语音的速度,并分析基于自身语音识别结果调整自动语音识别(ASR)系统是否能够模拟适应过程,来检验这一假设。我们考虑三种类型的声学退化,即噪声声码、时间压缩和局部时间反转。仅接触少数声学退化的句子后,人类语音识别率可提高20%以上。关键的是,具有内部语言反馈的ASR系统能够以人类水平的速度和精度适应退化语音。这些结果表明,基于语言反馈的自监督学习是人类适应声学退化语音的一种合理策略。