Archean LABS, Montauban, France.
Service d'Oto-Rhino-Laryngologie, d'Oto-Neurologie et d'ORL Pédiatrique, Centre Hospitalier Universitaire de Toulouse, France.
Trends Hear. 2020 Jan-Dec;24:2331216520914769. doi: 10.1177/2331216520914769.
The objective of this study was to provide proof of concept that the speech intelligibility in quiet of unaided older hearing-impaired (OHI) listeners can be predicted by automatic speech recognition (ASR). Twenty-four OHI listeners completed three speech-identification tasks using speech materials of varying linguistic complexity and predictability (i.e., logatoms, words, and sentences). An ASR system was first trained on different speech materials and then used to recognize the same speech stimuli presented to the listeners but processed to mimic some of the perceptual consequences of age-related hearing loss experienced by each of the listeners: the elevation of hearing thresholds (by linear filtering), the loss of frequency selectivity (by spectrally smearing), and loudness recruitment (by raising the amplitude envelope to a power). Independently of the size of the lexicon used in the ASR system, strong to very strong correlations were observed between human and machine intelligibility scores. However, large root-mean-square errors (RMSEs) were observed for all conditions. The simulation of frequency selectivity loss had a negative impact on the strength of the correlation and the RMSE. Highest correlations and smallest RMSEs were found for logatoms, suggesting that the prediction system reflects mostly the functioning of the peripheral part of the auditory system. In the case of sentences, the prediction of human intelligibility was significantly improved by taking into account cognitive performance. This study demonstrates for the first time that ASR, even when trained on intact independent speech material, can be used to estimate trends in speech intelligibility of OHI listeners.
本研究旨在验证自动语音识别(ASR)是否可以预测未助听的老年听力障碍(OHI)者在安静环境下的言语可懂度。24 名 OHI 听力者使用不同语言复杂度和可预测性的言语材料(即音位、单词和句子)完成了三个言语识别任务。首先,ASR 系统针对不同的言语材料进行了训练,然后用于识别呈现给听力者的相同言语刺激,但经过处理以模拟每位听力者所经历的与年龄相关的听力损失的某些感知后果:听力阈值升高(通过线性滤波)、频率选择性丧失(通过频谱弥散)和响度募集(通过将幅度包络提高到幂)。独立于 ASR 系统中使用的词汇量大小,在人类和机器的可懂度得分之间观察到强到非常强的相关性。然而,对于所有条件,观察到较大的均方根误差(RMSE)。频率选择性丧失的模拟对相关性和 RMSE 产生了负面影响。对于音位,观察到最强的相关性和最小的 RMSE,这表明预测系统主要反映了听觉系统外围部分的功能。在句子的情况下,通过考虑认知表现,可以显著提高对人类可懂度的预测。本研究首次证明,即使是使用完整的独立言语材料进行训练的 ASR,也可以用于估计 OHI 听力者言语可懂度的趋势。