School of Nursing, The University of Hong Kong, 3 Sassoon Road, Pokfulam, Hong Kong, PR China.
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, PR China.
BMC Public Health. 2024 May 20;24(1):1351. doi: 10.1186/s12889-024-18830-1.
Adolescent weight problems have become a growing public health concern, making early prediction of non-normal weight status crucial for effective prevention. However, few temporal prediction tools for adolescent four weight status have been developed. This study aimed to predict the short- and long-term weight status of Hong Kong adolescents and assess the importance of predictors.
A population-based retrospective cohort study of adolescents was conducted using data from a territory-wide voluntary annual health assessment service provided by the Department of Health in Hong Kong. Using diet habits, physical activity, psychological well-being, and demographics, we generated six prediction models for successive weight status (normal, overweight, obese and underweight) using multiclass Decision Tree, Random Forest, k-Nearest Neighbor, eXtreme gradient boosting, support vector machine, logistic regression. Model performance was evaluated by multiple standard classifier metrics and the overall accuracy. Predictors' importance was assessed using Shapley values.
442,898 Primary 4 (P4, Grade 4 in the US) and 344,186 in Primary 6 (P6, Grade 6 in the US) students, with followed up until their Secondary 6 (Grade 12 in the US) during the academic years 1995/96 to 2014/15 were included. The XG Boosts model consistently outperformed all other model in predicting the long-term weight status at S6 from P4 or P6. It achieved an overall accuracy of 0.72 or 0.74, a micro-averaging AUC of 0.92 or 0.93, and a macro-averaging AUC of 0.83 or 0.86, respectively. XG Boost also demonstrated accurate predictions for each predicted weight status, surpassing the AUC values obtained by other models. Weight, height, sex, age, frequency and hours of aerobic exercise were consistently the most important predictors for both cohorts.
The machine learning approaches accurately predict adolescent weight status in both short- and long-term. The developed multiclass model that utilizing easy-assessed variables enables accurate long-term prediction on weight status, which can be used by adolescents and parents for self-prediction when applied in health care system. The interpretable models may help to provide the early and individualized interventions suggestions for adolescents with weight problems particularly.
青少年体重问题已成为日益严重的公共卫生问题,因此对非正常体重状态进行早期预测对于有效预防至关重要。然而,目前针对青少年体重状态的短期和长期预测工具还很少。本研究旨在预测香港青少年的短期和长期体重状态,并评估预测指标的重要性。
本研究采用了基于人群的回顾性队列研究,数据来源于香港卫生署提供的全港性自愿年度健康评估服务。我们使用饮食习惯、身体活动、心理健康和人口统计学等信息,通过多类决策树、随机森林、k-最近邻、极端梯度提升、支持向量机和逻辑回归等方法,为连续体重状态(正常、超重、肥胖和消瘦)生成了六个预测模型。我们使用多个标准分类器指标和整体准确率来评估模型性能。使用 Shapley 值评估预测指标的重要性。
共纳入了 1995/96 学年至 2014/15 学年期间在小学四年级(美国为六年级)和小学六年级(美国为八年级)的 442898 名和 344186 名学生,对他们进行了随访,直至其高中六年级(美国为十二年级)。XG Boosts 模型在预测从四年级或六年级到高中六年级的长期体重状态方面始终优于其他所有模型。它的整体准确率为 0.72 或 0.74,微平均 AUC 为 0.92 或 0.93,宏平均 AUC 为 0.83 或 0.86。XG Boost 还对每个预测的体重状态进行了准确预测,超过了其他模型获得的 AUC 值。体重、身高、性别、年龄、有氧运动的频率和时间一直是两个队列中最重要的预测因素。
机器学习方法可以准确预测青少年的短期和长期体重状态。本研究开发的多类模型利用易于评估的变量,可以实现对体重状态的准确长期预测,在医疗保健系统中应用时,可以供青少年及其家长进行自我预测。可解释模型可能有助于为有体重问题的青少年提供早期和个体化的干预建议。