机器学习识别儿童和青少年短期和长期体重状况的关键预测因子。

Robust identification key predictors of short- and long-term weight status in children and adolescents by machine learning.

机构信息

School of Nursing, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.

Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.

出版信息

Front Public Health. 2024 Sep 24;12:1414046. doi: 10.3389/fpubh.2024.1414046. eCollection 2024.

DOI:10.3389/fpubh.2024.1414046

PMID:39381765

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11458556/

Abstract

BACKGROUND

Early identification of high-risk individuals for weight problems in children and adolescents is crucial for implementing timely preventive measures. While machine learning (ML) techniques have shown promise in addressing this complex challenge with high-dimensional data, feature selection is vital for identifying the key predictors that can facilitate effective and targeted interventions. This study aims to utilize feature selection process to identify a robust and minimal set of predictors that can aid in the early prediction of short- and long-term weight problems in children and adolescents.

METHODS

We utilized demographic, physical, and psychological wellbeing predictors to model weight status (normal, underweight, overweight, and obese) for 1-, 3-, and 5-year periods. To select the most influential features, we employed four feature selection methods: (1) Chi-Square test; (2) Information Gain; (3) Random Forest; (4) eXtreme Gradient Boosting (XGBoost) with six ML approaches. The stability of the feature selection methods was assessed by Jaccard's index, Spearman's rank correlation and Pearson's correlation. Model evaluation was performed by various accuracy metrics.

RESULTS

With 3,862,820 million student-visits were included in this population-based study, the mean age of 11.6 (SD = 3.64) for the training set and 10.8 years (SD = 3.50) for the temporal test set. From the initial set of 38 predictors, we identified 6, 9, and 13 features for 1-, 3-, and 5-year predictions, respectively, by the best performed feature selection method of Chi-Square test in XGBoost models. These feature sets demonstrated excellent stability and achieved prediction accuracies of 0.82, 0.73, and 0.70; macro-AUCs of 0.94, 0.86, and 0.83; micro-AUCs of 0.96, 0.93, and 0.92 for different prediction windows, respectively. Weight, height, sex, total score of self-esteem, and age were consistently the most influential predictors across all prediction windows. Additionally, several psychological and social wellbeing predictors showed relatively high importance in long-term weight status prediction.

CONCLUSIONS

We demonstrate the potential of ML in identifying key predictors of weight status in children and adolescents. While traditional anthropometric measures remain important, psychological and social wellbeing factors also emerge as crucial predictors, potentially informing targeted interventions to address childhood and adolescence weight problems.

摘要

背景

早期识别儿童和青少年体重问题的高危个体对于及时实施预防措施至关重要。虽然机器学习（ML）技术在处理具有高维数据的复杂挑战方面显示出了前景，但特征选择对于识别能够促进有效和有针对性干预的关键预测因素至关重要。本研究旨在利用特征选择过程，确定一组强大且最小的预测因素，以帮助早期预测儿童和青少年的短期和长期体重问题。

方法

我们利用人口统计学、身体和心理健康福利预测因素来建模体重状况（正常、体重不足、超重和肥胖），预测期为 1 年、3 年和 5 年。为了选择最具影响力的特征，我们采用了四种特征选择方法：（1）卡方检验；（2）信息增益；（3）随机森林；（4）极端梯度提升（XGBoost）与六种 ML 方法。特征选择方法的稳定性通过杰卡德指数、斯皮尔曼等级相关和皮尔逊相关进行评估。通过各种准确性指标对模型进行评估。

结果

在这项基于人群的研究中，共纳入了 3862820 名学生就诊，训练集的平均年龄为 11.6 岁（SD=3.64），时间测试集的平均年龄为 10.8 岁（SD=3.50）。从最初的 38 个预测因素中，我们通过 XGBoost 模型中的最佳卡方检验特征选择方法，分别为 1 年、3 年和 5 年预测确定了 6、9 和 13 个特征。这些特征集表现出出色的稳定性，在不同的预测窗口中，预测准确率分别为 0.82、0.73 和 0.70；宏 AUC 分别为 0.94、0.86 和 0.83；微 AUC 分别为 0.96、0.93 和 0.92。体重、身高、性别、自尊总分和年龄在所有预测窗口中始终是最具影响力的预测因素。此外，一些心理和社会健康福利预测因素在长期体重状况预测中表现出相对较高的重要性。