Center for Language and Brain, HSE University, Moscow, Russia.
School of Data Analysis and Artificial Intelligence, Faculty of Computer Science, Moscow, Russia.
PLoS One. 2023 Nov 22;18(11):e0292047. doi: 10.1371/journal.pone.0292047. eCollection 2023.
This paper represents our research results in the pursuit of the following objectives: (i) to introduce a novel multi-sources data set to tackle the shortcomings of the previous data sets, (ii) to propose a robust artificial intelligence-based solution to identify dyslexia in primary school pupils, (iii) to investigate our psycholinguistic knowledge by studying the importance of the features in identifying dyslexia by our best AI model. In order to achieve the first objective, we collected and annotated a new set of eye-movement-during-reading data. Furthermore, we collected demographic data, including the measure of non-verbal intelligence, to form our three data sources. Our data set is the largest eye-movement data set globally. Unlike the previously introduced binary-class data sets, it contains (A) three class labels and (B) reading speed. Concerning the second objective, we formulated the task of dyslexia prediction as regression and classification problems and scrutinized the performance of 12 classifications and eight regressions approaches. We exploited the Bayesian optimization method to fine-tune the hyperparameters of the models: and reported the average and the standard deviation of our evaluation metrics in a stratified ten-fold cross-validation. Our studies showed that multi-layer perceptron, random forest, gradient boosting, and k-nearest neighbor form the group having the most acceptable results. Moreover, we showed that although separately using each data source did not lead to accurate results, their combination led to a reliable solution. We also determined the importance of the features of our best classifier: our findings showed that the IQ, gender, and age are the top three important features; we also showed that fixation along the y-axis is more important than other fixation data. Dyslexia detection, eye fixation, eye movement, demographic, classification, regression, artificial intelligence.
(i)引入了一个新的多源数据集,以解决之前数据集的不足,(ii)提出了一种基于人工智能的稳健解决方案,以识别小学生的阅读障碍,(iii)通过研究我们最佳人工智能模型识别阅读障碍的特征的重要性,来研究我们的心理语言学知识。为了实现第一个目标,我们收集并标注了一组新的阅读时眼动数据。此外,我们收集了人口统计学数据,包括非言语智力的衡量标准,形成了我们的三个数据源。我们的数据集是全球最大的眼动数据集。与之前介绍的二进制数据集不同,它包含(A)三个类别标签和(B)阅读速度。关于第二个目标,我们将阅读障碍预测任务表述为回归和分类问题,并仔细研究了 12 种分类和 8 种回归方法的性能。我们利用贝叶斯优化方法来微调模型的超参数:并在分层十折交叉验证中报告了我们评估指标的平均值和标准差。我们的研究表明,多层感知机、随机森林、梯度提升和 K-最近邻构成了具有最可接受结果的群组。此外,我们表明,尽管单独使用每个数据源不会导致准确的结果,但它们的组合导致了可靠的解决方案。我们还确定了我们最佳分类器的特征的重要性:我们的研究结果表明,智商、性别和年龄是最重要的三个特征;我们还表明,沿着 y 轴的注视比其他注视数据更重要。阅读障碍检测、眼动、眼动、人口统计学、分类、回归、人工智能。