Siddique Faizaan, Lee Brian K
Department of Epidemiology and Biostatistics, School of Public Health, Drexel University, Philadelphia, PA, United States of America.
Conestoga High School, Berwyn, PA, United States of America.
Glob Epidemiol. 2024 Aug 29;8:100161. doi: 10.1016/j.gloepi.2024.100161. eCollection 2024 Dec.
The successful implementation and interpretation of machine learning (ML) models in epidemiological studies can be challenging without an extensive programming background. We provide a didactic example of machine learning for risk prediction in this study by determining whether early life factors could be useful for predicting adolescent psychopathology.
In total, 9643 adolescents ages 9-10 from the Adolescent Brain and Cognitive Development (ABCD) Study were included in ML analysis to predict high Child Behavior Checklist (CBCL) scores (i.e., t-scores ≥ 60). ML models were constructed using a series of predictor combinations (prenatal, family history, sociodemographic) across 5 different algorithms. We assessed ML performance through sensitivity, specificity, F1-score, and area under the curve (AUC) metrics.
A total of 1267 adolescents (13.1 %) were found to have high CBCL scores. Across all 5 ML algorithms, family history factors (e.g., either parent had nervous breakdowns, trouble holding jobs/fights/police encounters, and counseling for mental issues) and sociodemographic covariates (e.g., maternal age, child's sex, caregiver income and caregiver education) tended to be better predictors of adolescent psychopathology. The most important prenatal predictors were unplanned pregnancy, birth complications, and pregnancy complications.
Our results suggest that inclusion of prenatal, family history, and sociodemographic factors in ML models can generate moderately accurate predictions of adolescent psychopathology. Issues associated with model overfitting, hyperparameter tuning, and system seed setting should be considered throughout model training, testing, and validation. Future early risk predictions models may improve with the inclusion of additional relevant covariates.
在没有广泛编程背景的情况下,机器学习(ML)模型在流行病学研究中的成功实施和解释可能具有挑战性。在本研究中,我们通过确定早期生活因素是否有助于预测青少年精神病理学,提供了一个用于风险预测的机器学习教学示例。
共有来自青少年大脑与认知发展(ABCD)研究的9643名9至10岁青少年纳入ML分析,以预测儿童行为清单(CBCL)高分(即T分数≥60)。使用5种不同算法的一系列预测变量组合(产前、家族史、社会人口统计学)构建ML模型。我们通过敏感性、特异性、F1分数和曲线下面积(AUC)指标评估ML性能。
共发现1267名青少年(13.1%)CBCL得分高。在所有5种ML算法中,家族史因素(例如,父母一方有精神崩溃、工作困难/打架/与警方接触、接受心理问题咨询)和社会人口统计学协变量(例如,母亲年龄、孩子性别、照顾者收入和照顾者教育程度)往往是青少年精神病理学更好的预测指标。最重要的产前预测因素是意外怀孕、分娩并发症和妊娠并发症。
我们的结果表明,在ML模型中纳入产前、家族史和社会人口统计学因素可以对青少年精神病理学产生适度准确的预测。在整个模型训练、测试和验证过程中,应考虑与模型过拟合、超参数调整和系统种子设置相关的问题。未来纳入更多相关协变量可能会改进早期风险预测模型。