回声环境中的语音识别以及衰老和听力障碍的影响。

Speech recognition in echoic environments and the effect of aging and hearing impairment.

作者信息

Ding Nai, Gao Jiaxin, Wang Jing, Sun Wenhui, Fang Mingxuan, Liu Xiaoling, Zhao Hua

机构信息

College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.

Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, Zhejiang, China.

出版信息

Hear Res. 2023 Apr;431:108725. doi: 10.1016/j.heares.2023.108725. Epub 2023 Feb 26.

DOI:10.1016/j.heares.2023.108725

PMID:36931021

Abstract

Temporal modulations provide critical cues for speech recognition. When the temporal modulations are distorted by, e.g., reverberations, speech intelligibility drops, and the drop in speech intelligibility can be explained by the amount of distortions to the speech modulation spectrum, i.e., the spectrum of temporal modulations. Here, we test a condition in which speech is contaminated by a single echo. Speech is delayed by either 0.125 s or 0.25 s to create an echo, and these two conditions notch out the temporal modulations at 2 or 4 Hz, respectively. We evaluate how well young and older listeners can recognize such echoic speech. For young listeners, the speech recognition rate is not influenced by the echo, even when they are exposed to the first echoic sentence. For older listeners, the speech recognition rate drops to less than 60% when listening to the first echoic sentence, but rapidly recovers to above 75% with exposure to a few sentences. Further analyses reveal that both age and the hearing threshold influence the recognition of echoic speech for the older listeners. These results show that the recognition of echoic speech cannot be fully explained by distortions to the modulation spectrum, and suggest that the auditory system has mechanisms to effectively compensate the influence of single echoes.

摘要

时间调制为语音识别提供关键线索。当时间调制因例如混响而失真时，语音清晰度会下降，并且语音清晰度的下降可以通过语音调制频谱（即时间调制的频谱）的失真量来解释。在此，我们测试一种语音被单个回声污染的情况。语音延迟0.125秒或0.25秒以产生回声，这两种情况分别在2赫兹或4赫兹处消除时间调制。我们评估年轻和年长听众识别这种带回声语音的能力。对于年轻听众，即使他们听到第一个带回声的句子，语音识别率也不受回声影响。对于年长听众，在听到第一个带回声的句子时，语音识别率降至60%以下，但在接触几个句子后迅速恢复到75%以上。进一步分析表明，年龄和听力阈值都会影响年长听众对带回声语音的识别。这些结果表明，带回声语音的识别不能完全通过调制频谱的失真来解释，并表明听觉系统具有有效补偿单个回声影响的机制。