Choong Casey, Xavier Neena, Falcon Beverly, Kan Hong, Lipkovich Ilya, Nowak Callie, Hoyt Margaret, Houle Christy, Kahan Scott
Eli Lilly and Company, Indianapolis, Indiana, USA.
National Center for Weight and Wellness, George Washington University School of Medicine, Washington, Washington DC, USA.
Diabetes Obes Metab. 2025 Jun;27(6):3061-3071. doi: 10.1111/dom.16311. Epub 2025 Mar 11.
Numerous risk factors for the development of obesity have been identified, yet the aetiology is not well understood. Traditional statistical methods for analysing observational data are limited by the volume and characteristics of large datasets. Machine learning (ML) methods can analyse large datasets to extract novel insights on risk factors for obesity. This study predicted adults at risk of a ≥10% increase in index body mass index (BMI) within 12 months using ML and a large electronic medical records (EMR) database.
ML algorithms were used with EMR from Optum's de-identified Market Clarity Data, a US database. Models included extreme gradient boosting (XGBoost), random forest, simple logistic regression (no feature selection procedure) and two penalised logistic models (Elastic Net and Least Absolute Shrinkage and Selection Operator [LASSO]). Performance metrics included the area under the curve (AUC) of the receiver operating characteristic curve (used to determine the best-performing model), average precision, Brier score, accuracy, recall, positive predictive value, Youden index, F1 score, negative predictive value and specificity.
The XGBoost model performed best 12 months post-index, with an AUC of 0.75. Lower baseline BMI, having any emergency room visit during the study period, no diabetes mellitus, no lipid disorders and younger age were among the top predictors for ≥10% increase in index BMI.
The current study demonstrates an ML approach applied to EMR to identify those at risk for weight gain over 12 months. Providers may use this risk stratification to prioritise prevention strategies or earlier obesity intervention.
已确定了许多导致肥胖的风险因素,但其病因尚未完全明确。用于分析观察性数据的传统统计方法受到大型数据集的数量和特征的限制。机器学习(ML)方法可以分析大型数据集,以提取有关肥胖风险因素的新见解。本研究使用机器学习和一个大型电子病历(EMR)数据库预测在12个月内指数体重指数(BMI)增加≥10%的成年风险人群。
将机器学习算法与来自美国Optum公司匿名化的市场透明度数据中的电子病历一起使用。模型包括极端梯度提升(XGBoost)、随机森林、简单逻辑回归(无特征选择程序)和两种惩罚逻辑模型(弹性网络和最小绝对收缩和选择算子 [LASSO])。性能指标包括受试者工作特征曲线的曲线下面积(AUC)(用于确定性能最佳的模型)、平均精度、布里尔评分、准确性、召回率、阳性预测值、约登指数、F1评分、阴性预测值和特异性。
XGBoost模型在指数后12个月表现最佳,AUC为0.75。较低的基线BMI、在研究期间有过任何急诊就诊、无糖尿病、无血脂异常以及较年轻是指数BMI增加≥10%的主要预测因素。
本研究展示了一种应用于电子病历的机器学习方法,以识别在12个月内有体重增加风险的人群。医疗服务提供者可以使用这种风险分层来确定预防策略或早期肥胖干预的优先级。