Department of Neuroscience and Department of Electrical Engineering, KU Leuven, Leuven, Vlaams Brabant, 3000, Belgium.
Department of Electrical Engineering, KU Leuven, Leuven, Vlaams Brabant, 3000, Belgium.
J Neural Eng. 2021 Nov 15;18(6). doi: 10.1088/1741-2552/ac33e9.
Currently, only behavioral speech understanding tests are available, which require active participation of the person being tested. As this is infeasible for certain populations, an objective measure of speech intelligibility is required. Recently, brain imaging data has been used to establish a relationship between stimulus and brain response. Linear models have been successfully linked to speech intelligibility but require per-subject training. We present a deep-learning-based model incorporating dilated convolutions that operates in a match/mismatch paradigm. The accuracy of the model's match/mismatch predictions can be used as a proxy for speech intelligibility without subject-specific (re)training.We evaluated the performance of the model as a function of input segment length, electroencephalography (EEG) frequency band and receptive field size while comparing it to multiple baseline models. Next, we evaluated performance on held-out data and finetuning. Finally, we established a link between the accuracy of our model and the state-of-the-art behavioral MATRIX test.The dilated convolutional model significantly outperformed the baseline models for every input segment length, for all EEG frequency bands except the delta and theta band, and receptive field sizes between 250 and 500 ms. Additionally, finetuning significantly increased the accuracy on a held-out dataset. Finally, a significant correlation (= 0.59,= 0.0154) was found between the speech reception threshold (SRT) estimated using the behavioral MATRIX test and our objective method.Our method is the first to predict the SRT from EEG for unseen subjects, contributing to objective measures of speech intelligibility.
目前,只有行为言语理解测试可用,而这些测试需要被测试者的积极参与。对于某些人群来说,这是不可行的,因此需要一种客观的言语可懂度测量方法。最近,脑成像数据已被用于建立刺激与大脑反应之间的关系。线性模型已成功与言语可懂度相关联,但需要针对每个受试者进行训练。我们提出了一种基于深度学习的模型,该模型结合了扩张卷积,以匹配/不匹配范式运行。该模型的匹配/不匹配预测的准确性可以用作言语可懂度的替代指标,而无需针对特定个体进行(重新)训练。我们评估了该模型作为输入段长度、脑电图(EEG)频带和感受野大小的函数的性能,同时将其与多个基线模型进行了比较。接下来,我们评估了在保留数据和微调上的性能。最后,我们建立了我们的模型的准确性与最先进的行为 MATRIX 测试之间的联系。扩张卷积模型在每个输入段长度、除了 delta 和 theta 频带以及 250 到 500 毫秒之间的感受野大小的所有 EEG 频带中,都显著优于基线模型。此外,微调显著提高了保留数据集上的准确性。最后,我们发现使用行为 MATRIX 测试估计的言语接受阈(SRT)与我们的客观方法之间存在显著相关性(=0.59,=0.0154)。我们的方法是第一个从 EEG 预测未见过的受试者的 SRT 的方法,为言语可懂度的客观测量做出了贡献。