Jawara Dawda, Lauer Kate V, Venkatesh Manasa, Stalter Lily N, Hanlon Bret, Churpek Matthew M, Funk Luke M
Department of Surgery, University of Wisconsin, Madison, Wisconsin.
Department of Surgery, University of Wisconsin, Madison, Wisconsin; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin.
J Surg Res. 2025 Feb;306:43-53. doi: 10.1016/j.jss.2024.11.042. Epub 2024 Dec 31.
Obesity, defined as a body mass index ≥30 kg/m, is a major public health concern in the United States. Preventative approaches are essential, but they are limited by an inability to accurately predict individuals at highest risk of weight gain. Our objective was to develop accurate weight gain prediction models using the National Institutes of Health All of Us dataset. We hypothesized that machine learning models using both electronic health record and behavioral survey data would outperform models using electronic health record data alone.
The All of Us dataset was used to identify adults between 18 and 70 ys old with weight measurements 2 y apart between 2008 and 2022. Patients with a history of cancer, bariatric surgery, or pregnancy were excluded. Demographics, vital signs, laboratory results, comorbidities, and survey data (Alcohol Use Disorder Identification Test, Patient-Reported Outcomes Measurement Information System physical and mental health scores) were included as model parameters. Elastic net and XGBoost machine learning models were developed with and without survey data to predict ≥10% total body weight gain within 2 y. The data were split into a training sample (60%) and a testing sample (40%), and parameters were tuned using 10-fold cross-validation. Performance was compared using area under the receiver operating characteristic curves (AUCs).
Our cohort consisted of 34,715 patients (mean [SD] age 50.9 [13.4] y; 45.7% White; 55.3% female). Over a 2-y span, 10.4% of the cohort gained ≥10% total body weight. AUCs were 0.677 [95% DeLong confidence interval 0.665-0.688] for elastic net and 0.706 [0.695-0.717] for XGBoost. Incorporation of survey data did not improve predictability, with AUCs of 0.681 [0.669-0.692] and 0.705 [0.694-0.716], respectively.
Our machine learning weight gain prediction models had modest performance that was not improved by survey data. The addition of other All of Us variables, including genomic data, may be informative in future studies.
肥胖定义为体重指数≥30kg/m²,是美国主要的公共卫生问题。预防措施至关重要,但由于无法准确预测体重增加风险最高的个体而受到限制。我们的目标是使用美国国立卫生研究院“我们所有人”数据集开发准确的体重增加预测模型。我们假设使用电子健康记录和行为调查数据的机器学习模型将优于仅使用电子健康记录数据的模型。
使用“我们所有人”数据集识别2008年至2022年期间年龄在18至70岁之间、体重测量间隔为2年的成年人。排除有癌症、减肥手术或怀孕史的患者。将人口统计学、生命体征、实验室检查结果、合并症和调查数据(酒精使用障碍识别测试、患者报告结局测量信息系统身心健康评分)作为模型参数。开发了有和没有调查数据的弹性网络和XGBoost机器学习模型,以预测2年内总体重增加≥10%。数据被分为训练样本(60%)和测试样本(40%),并使用10折交叉验证调整参数。使用受试者工作特征曲线下面积(AUC)比较性能。
我们的队列包括34715名患者(平均[标准差]年龄50.9[13.4]岁;45.7%为白人;55.3%为女性)。在2年的时间里,10.4%的队列总体重增加≥10%。弹性网络的AUC为0.677[95%德朗置信区间0.665-0.688],XGBoost的AUC为0.706[0.695-0.717]。纳入调查数据并没有提高预测能力,AUC分别为0.681[0.669-0.692]和0.705[0.694-0.716]。
我们的机器学习体重增加预测模型性能一般,调查数据并未使其得到改善。在未来的研究中,添加“我们所有人”的其他变量,包括基因组数据,可能会提供有价值的信息。