Department of Surgery, University of Wisconsin, Madison, Wisconsin.
Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin.
J Surg Res. 2023 Nov;291:7-16. doi: 10.1016/j.jss.2023.05.015. Epub 2023 Jun 15.
Weight gain among young adults continues to increase. Identifying adults at high risk for weight gain and intervening before they gain weight could have a major public health impact. Our objective was to develop and test electronic health record-based machine learning models to predict weight gain in young adults with overweight/class 1 obesity.
Seven machine learning models were assessed, including three regression models, random forest, single-layer neural network, gradient-boosted decision trees, and support vector machine (SVM) models. Four categories of predictors were included: 1) demographics; 2) obesity-related health conditions; 3) laboratory data and vital signs; and 4) neighborhood-level variables. The cohort was split 60:40 for model training and validation. Area under the receiver operating characteristic curves (AUC) were calculated to determine model accuracy at predicting high-risk individuals, defined by ≥ 10% total body weight gain within 2 y. Variable importance was measured via generalized analysis of variance procedures.
Of the 24,183 patients (mean [SD] age, 32.0 [6.3] y; 55.1% females) in the study, 14.2% gained ≥10% total body weight. Area under the receiver operating characteristic curves varied from 0.557 (SVM) to 0.675 (gradient-boosted decision trees). Age, sex, and baseline body mass index were the most important predictors among the models except SVM and neural network.
Our machine learning models performed similarly and had modest accuracy for identifying young adults at risk of weight gain. Future models may need to incorporate behavioral and/or genetic information to enhance model accuracy.
年轻人的体重增加仍在持续。识别有体重增加风险的成年人,并在他们体重增加之前进行干预,可能会对公众健康产生重大影响。我们的目的是开发和测试基于电子健康记录的机器学习模型,以预测超重/1 类肥胖的年轻成年人的体重增加。
评估了七种机器学习模型,包括三种回归模型、随机森林、单层神经网络、梯度提升决策树和支持向量机(SVM)模型。纳入了四类预测因子:1)人口统计学;2)肥胖相关健康状况;3)实验室数据和生命体征;4)社区级别的变量。队列被分为 60:40 用于模型训练和验证。计算了接收者操作特征曲线下的面积(AUC),以确定模型预测高危个体的准确性,高危个体定义为在 2 年内体重增加≥10%。通过广义方差分析程序测量变量的重要性。
在研究的 24183 名患者(平均[标准差]年龄 32.0[6.3]岁;55.1%为女性)中,14.2%的患者体重增加≥10%。接收者操作特征曲线下的面积从 0.557(SVM)到 0.675(梯度提升决策树)不等。年龄、性别和基线体重指数是除 SVM 和神经网络之外的模型中最重要的预测因子。
我们的机器学习模型表现相似,对识别有体重增加风险的年轻人的准确性适中。未来的模型可能需要纳入行为和/或遗传信息以提高模型的准确性。