Na Youngmin, Joo Hyosung, Trang Le Thi, Quan Luong Do Anh, Woo Jihwan
Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea.
Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea.
Front Neurosci. 2022 Aug 18;16:906616. doi: 10.3389/fnins.2022.906616. eCollection 2022.
Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user's speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four-channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models' informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.
听觉假体为听力受损患者的康复提供了机会。言语可懂度可用于评估听觉假体改善用户言语理解的程度。虽然基于行为的言语可懂度是金标准,但由于其主观性,精确评估受到限制。在此,我们使用卷积神经网络从脑电图(EEG)预测言语可懂度。记录了87名听力正常的成年参与者的64通道脑电图。使用由2、3、4、5和8通道声码器频谱降级的句子来设置相对较低的言语可懂度条件。采用韩语句子识别测试。言语可懂度分数分为41个离散级别,范围从0到100%,步长为2.5%。未收集30.0%、37.5%和40.0%这三个分数。使用语音特征,即语音时间包络(ENV)和声位(PH)起始,提取连续语音脑电图用于言语可懂度预测。深度学习模型由事件相关电位(ERP)数据集、ERP与ENV之间、ERP与PH起始之间或ERP与PH和ENV乘积(PHENV)之间的相关系数进行训练。言语可懂度预测准确率分别为97.33%(ERP)、99.42%(ENV)、99.55%(PH)和99.91%(PHENV)。使用遮挡敏感性方法对模型进行解释。虽然ENV模型的信息电极位于枕叶区域,但音素模型(即PH和PHENV)的信息电极基于位于语言处理区域的遮挡敏感性图。在所测试的模型中,PHENV模型获得了最佳的言语可懂度预测准确率。该模型可能通过舒适的言语可懂度测试促进言语可懂度的临床预测。