Wang Yen-Chin, Cheng Chung-Yuan, Wu Chi-Shin, Lee Chi-Chun, Gau Susan Shur-Fen
Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan.
Department of Psychiatry, National Taiwan University Hospital, Hsin-Chu branch, Hsinchu, Taiwan.
Autism. 2025 Aug 5:13623613251360271. doi: 10.1177/13623613251360271.
Machine-learning models can assist in diagnosing autism but have biases. We examines the correlates of misclassifications and how training data affect model generalizability. The Social Responsive Scale data were collected from two cohorts in Taiwan: the clinical cohort comprised 1203 autistic participants and 1182 non-autistic comparisons, and the community cohort consisted of 35 autistic participants and 3297 non-autistic comparisons. Classification models were trained, and the misclassification cases were investigated regarding their associations with sex, age, intelligence quotient (IQ), symptoms from the child behavioral checklist (CBCL), and co-occurring psychiatric diagnosis. Models showed high within-cohort accuracy (clinical: sensitivity 0.91-0.95, specificity 0.93-0.94; community: sensitivity 0.91-1.00, specificity 0.89-0.96), but generalizability across cohorts was limited. When the community-trained model was applied to the clinical cohort, performance declined (sensitivity 0.65, specificity 0.95). In both models, non-autistic individuals misclassified as autistic showed elevated behavioral symptoms and attention-deficit hyperactivity disorder (ADHD) prevalence. Conversely, autistic individuals who were misclassified tended to show fewer behavioral symptoms and, in the community model, higher IQ and aggressive behavior but less social and attention problems. Error patterns of machine-learning model and the impact of training data warrant careful consideration in future research.Lay AbstractMachine-learning is a type of computer model that can help identify patterns in data and make predictions. In autism research, these models may support earlier or more accurate identification of autistic individuals. But to be useful, they need to make reliable predictions across different groups of people. In this study, we explored when and why these models might make mistakes-and how the kind of data used to train them affects their accuracy. Training models means using information to teach the computer model how to tell the difference between autistic and non-autistic individuals. We used the information from the Social Responsiveness Scale (SRS), which is a questionnaire that measures autistic features. We tested these models on two different groups: one from clinical settings and one from the general community. The models worked well when tested within the same type of group they were trained. However, a model trained on the community group did not perform as accurately when tested on the clinical group. Sometimes, the model got it wrong. For example, in the clinical group, some autistic individuals were mistakenly identified as non-autistic. These individuals tended to have fewer emotional or behavioral difficulties. In the community group, autistic individuals who were mistakenly identified as non-autistic had higher IQs and showed more aggressive behaviors but fewer attention or social problems. On the contrary, some non-autistic people were incorrectly identified as autistic. These people had more emotional or behavioral challenges and were more likely to have attention-deficit hyperactivity disorder (ADHD). These findings highlight that machine-learning models are sensitive to the type of data they are trained on. To build fair and accurate models for predicting autism, it is essential to consider where the training data come from and whether it represents the full diversity of individuals. Understanding these patterns of error can help improve future tools used in both research and clinical care.
机器学习模型可辅助自闭症诊断,但存在偏差。我们研究了错误分类的相关因素以及训练数据如何影响模型的泛化能力。社会反应量表数据来自台湾的两个队列:临床队列包括1203名自闭症参与者和1182名非自闭症对照者,社区队列由35名自闭症参与者和3297名非自闭症对照者组成。训练了分类模型,并调查了错误分类病例与性别、年龄、智商(IQ)、儿童行为清单(CBCL)中的症状以及共病精神诊断之间的关联。模型在队列内部显示出较高的准确率(临床队列:敏感性0.91 - 0.95,特异性0.93 - 0.94;社区队列:敏感性0.91 - 1.00;特异性0.89 - 0.96),但跨队列的泛化能力有限。当将社区训练的模型应用于临床队列时,性能下降(敏感性0.65,特异性0.95)。在两个模型中,被误分类为自闭症的非自闭症个体表现出行为症状增加和注意力缺陷多动障碍(ADHD)患病率升高。相反,被误分类的自闭症个体往往表现出较少的行为症状,在社区模型中,智商较高且有攻击性行为,但社交和注意力问题较少。机器学习模型的错误模式以及训练数据的影响在未来研究中值得仔细考虑。
机器学习是一种计算机模型,可帮助识别数据中的模式并进行预测。在自闭症研究中,这些模型可能有助于更早或更准确地识别自闭症个体。但要发挥作用,它们需要在不同人群中做出可靠的预测。在本研究中,我们探讨了这些模型何时以及为何可能出错,以及用于训练它们的数据类型如何影响其准确性。训练模型意味着使用信息来教导计算机模型如何区分自闭症和非自闭症个体。我们使用了社会反应量表(SRS)中的信息,这是一份测量自闭症特征的问卷。我们在两个不同的组上测试了这些模型:一个来自临床环境,一个来自普通社区。当在与训练模型相同类型的组内进行测试时,模型表现良好。然而,在社区组上训练的模型在临床组上进行测试时,表现并不那么准确。有时,模型会出错。例如,在临床组中,一些自闭症个体被错误地识别为非自闭症。这些个体往往情绪或行为困难较少。在社区组中,被错误地识别为非自闭症的自闭症个体智商较高,表现出更多的攻击性行为,但注意力或社交问题较少。相反,一些非自闭症个体被错误地识别为自闭症。这些人有更多的情绪或行为挑战,更有可能患有注意力缺陷多动障碍(ADHD)。这些发现突出表明,机器学习模型对其训练所使用的数据类型敏感。为了构建用于预测自闭症的公平且准确的模型,必须考虑训练数据的来源以及它是否代表了个体的全部多样性。了解这些错误模式有助于改进未来用于研究和临床护理的工具。