Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Department of Mental Health, Meuhedet Health Services, Tel Aviv, Israel.
Eur Psychiatry. 2020 Feb 26;63(1):e22. doi: 10.1192/j.eurpsy.2020.17.
Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome. The aim of the current study was to test the ability of machine learning (ML) models applied to electronic medical records (EMRs) to predict ASD early in life, in a general population sample.
We used EMR data from a single Israeli Health Maintenance Organization, including EMR information for parents of 1,397 ASD children (ICD-9/10) and 94,741 non-ASD children born between January 1st, 1997 and December 31st, 2008. Routinely available parental sociodemographic information, parental medical histories, and prescribed medications data were used to generate features to train various ML algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross-validation by computing the area under the receiver operating characteristic curve (AUC; C-statistic), sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value [PPV]).
All ML models tested had similar performance. The average performance across all models had C-statistic of 0.709, sensitivity of 29.93%, specificity of 98.18%, accuracy of 95.62%, false positive rate of 1.81%, and PPV of 43.35% for predicting ASD in this dataset.
We conclude that ML algorithms combined with EMR capture early life ASD risk as well as reveal previously unknown features to be associated with ASD-risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.
目前在普通人群中早期识别自闭症谱系障碍(ASD)高危个体的方法有限,大多数 ASD 患者直到 4 岁后才被确诊。尽管有大量证据表明早期诊断和干预可以改善发育过程和结果。本研究旨在测试机器学习(ML)模型应用于电子病历(EMR)在普通人群样本中早期预测 ASD 的能力。
我们使用了来自以色列单一健康维护组织的 EMR 数据,包括 1397 名 ASD 儿童(ICD-9/10)和 94741 名非 ASD 儿童的父母的 EMR 信息,这些儿童出生于 1997 年 1 月 1 日至 2008 年 12 月 31 日之间。使用常规的父母社会人口统计学信息、父母病史和处方药物数据来生成特征,以训练各种 ML 算法,包括多变量逻辑回归、人工神经网络和随机森林。通过计算接收者操作特征曲线(ROC)下的面积(AUC;C 统计量)、敏感性、特异性、准确性、假阳性率和精度(阳性预测值 [PPV]),通过 10 折交叉验证评估预测性能。
所有测试的 ML 模型都具有相似的性能。所有模型的平均性能的 C 统计量为 0.709,敏感性为 29.93%,特异性为 98.18%,准确性为 95.62%,假阳性率为 1.81%,PPV 为 43.35%,用于预测该数据集中的 ASD。
我们得出结论,ML 算法结合 EMR 可以捕捉 ASD 风险的早期生命,并揭示以前未知的与 ASD 风险相关的特征。这种方法可能能够提高在大量儿童中进行 ASD 准确和高效早期检测的能力。