Suppr超能文献

基于最小的医疗和背景信息集对自闭症谱系障碍的机器学习预测。

Machine Learning Prediction of Autism Spectrum Disorder From a Minimal Set of Medical and Background Information.

机构信息

Center of Neurodevelopmental Disorders, Centre for Psychiatry Research, Department of Women's and Children's Health, Karolinska Institutet, Solna, Sweden.

Department of Highly Specialized Pediatric Orthopedics and Medicine, Astrid Lindgren Children's Hospital, Karolinska University Hospital, Region Stockholm, Stockholm, Sweden.

出版信息

JAMA Netw Open. 2024 Aug 1;7(8):e2429229. doi: 10.1001/jamanetworkopen.2024.29229.

Abstract

IMPORTANCE

Early identification of the likelihood of autism spectrum disorder (ASD) using minimal information is crucial for early diagnosis and intervention, which can affect developmental outcomes.

OBJECTIVE

To develop and validate a machine learning (ML) model for predicting ASD using a minimal set of features from background and medical information and to evaluate the predictors and the utility of the ML model.

DESIGN, SETTING, AND PARTICIPANTS: For this diagnostic study, a retrospective analysis of the Simons Foundation Powering Autism Research for Knowledge (SPARK) database, version 8 (released June 6, 2022), was conducted, including data from 30 660 participants after adjustments for missing values and class imbalances (15 330 with ASD and 15 330 without ASD). The SPARK database contains participants recruited from 31 university-affiliated research clinicals and online in 26 states in the US. All individuals with a professional ASD diagnosis and their families were eligible to participate. The model performance was validated on independent datasets from SPARK, version 10 (released July 21, 2023), and the Simons Simplex Collection (SSC), consisting of 14 790 participants, followed by phenotypic associations.

EXPOSURES

Twenty-eight basic medical screening and background history items present before 24 months of age.

MAIN OUTCOMES AND MEASURES

Generalizable ML prediction models were developed for detecting ASD using 4 algorithms (logistic regression, decision tree, random forest, and eXtreme Gradient Boosting [XGBoost]). Performance metrics included accuracy, area under the receiver operating characteristics curve (AUROC), sensitivity, specificity, positive predictive value (PPV), and F1 score, offering a comprehensive assessment of the predictive accuracy of the model. Explainable AI methods were applied to determine the effect of individual features in predicting ASD as secondary outcomes, enhancing the interpretability of the best-performing model. The secondary outcome analyses were further complemented by examining differences in various phenotypic measures using nonparametric statistical methods, providing insights into the ability of the model to differentiate between different presentations of ASD.

RESULTS

The study included 19 477 (63.5%) male and 11 183 (36.5%) female participants (mean [SD] age, 106 [62] months). The mean (SD) age was 113 (68) months for the ASD group and 100 (55) months for the non-ASD group. The XGBoost (termed AutMedAI) model demonstrated strong performance with an AUROC score of 0.895, sensitivity of 0.805, specificity of 0.829, and PPV of 0.897. Developmental milestones and eating behavior were the most important predictors. Validation on independent cohorts showed an AUROC of 0.790, indicating good generalizability.

CONCLUSIONS AND RELEVANCE

In this diagnostic study of ML prediction of ASD, robust model performance was observed to identify autistic individuals with more symptoms and lower cognitive levels. The robustness and ML model generalizability results are promising for further validation and use in clinical and population settings.

摘要

重要性

使用最少的背景和医学信息特征来早期识别自闭症谱系障碍(ASD)的可能性对于早期诊断和干预至关重要,这可能会影响发育结果。

目的

开发和验证一种使用最小特征集从背景和医学信息中预测 ASD 的机器学习(ML)模型,并评估预测因子和 ML 模型的效用。

设计、设置和参与者:这项诊断研究对 Simons 基金会自闭症研究力量知识(SPARK)数据库的版本 8(2022 年 6 月 6 日发布)进行了回顾性分析,其中包括调整缺失值和类别不平衡后的 30660 名参与者(15330 名患有 ASD,15330 名没有 ASD)的数据。SPARK 数据库包含在美国 26 个州的 31 个大学附属研究临床和在线招募的参与者。所有有专业 ASD 诊断的个人及其家属都有资格参加。该模型在 SPARK 版本 10(2023 年 7 月 21 日发布)和 Simons Simplex Collection(SSC)的独立数据集上进行了验证,其中包括 14790 名参与者,随后进行了表型关联分析。

暴露情况

28 项基本医疗筛查和 24 个月前的背景历史项目。

主要结果和测量

使用 4 种算法(逻辑回归、决策树、随机森林和极端梯度增强[XGBoost])开发了用于检测 ASD 的通用 ML 预测模型。性能指标包括准确性、接收器工作特征曲线下的面积(AUROC)、敏感性、特异性、阳性预测值(PPV)和 F1 分数,全面评估了模型的预测准确性。应用可解释的人工智能方法来确定个体特征在预测 ASD 中的作用,作为次要结果,提高了最佳性能模型的可解释性。通过使用非参数统计方法进一步检查不同表型测量之间的差异,对次要结果分析进行了补充,为模型区分不同 ASD 表现的能力提供了见解。

结果

研究包括 19477 名(63.5%)男性和 11183 名(36.5%)女性参与者(平均[SD]年龄,106[62]个月)。ASD 组的平均(SD)年龄为 113(68)个月,非 ASD 组为 100(55)个月。XGBoost(称为 AutMedAI)模型表现出很强的性能,AUROC 评分为 0.895、敏感性为 0.805、特异性为 0.829、PPV 为 0.897。发育里程碑和饮食行为是最重要的预测因素。在独立队列上的验证结果显示 AUROC 为 0.790,表明具有良好的泛化能力。

结论和相关性

在这项关于 ASD 的 ML 预测的诊断研究中,观察到稳健的模型性能可用于识别具有更多症状和较低认知水平的自闭症个体。稳健性和 ML 模型的泛化能力结果为进一步验证和在临床和人群环境中的使用提供了希望。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/989d/11333987/3c73e7ffabbb/jamanetwopen-e2429229-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验