Suppr超能文献

基于机器学习的青少年代谢综合征预测模型:利用2007 - 2016年美国国家健康与营养检查调查(NHANES)的数据

Machine Learning-Based predictive model for adolescent metabolic syndrome: Utilizing data from NHANES 2007-2016.

作者信息

Zhang Yu-Zhen, Wu Hai-Ying, Ma Run-Wei, Feng Bo, Yang Rui, Chen Xiao-Gang, Li Min-Xiao, Cheng Li-Ming

机构信息

Department of Anesthesiology and Surgical Intensive Care Unit, Kunming Children's Hospital, Kunming, Yunnan, China.

Department of Emergency, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China.

出版信息

Sci Rep. 2025 Jan 25;15(1):3274. doi: 10.1038/s41598-025-88156-4.

Abstract

Metabolic syndrome (Mets) in adolescents is a growing public health issue linked to obesity, hypertension, and insulin resistance, increasing risks of cardiovascular disease and mental health problems. Early detection and intervention are crucial but often hindered by complex diagnostic requirements. This study aims to develop a predictive model using NHANES data, excluding biochemical indicators, to provide a simple, cost-effective tool for large-scale, non-medical screening and early prevention of adolescent MetS. After excluding adolescents with missing diagnostic variables, the dataset included 2,459 adolescents via NHANES data from 2007-2016. We used LASSO regression and 20-fold cross-validation to screen for the variables with the greatest predictive value. The dataset was divided into training and validation sets in a 7:3 ratio, and SMOTE was used to expand the training set with a ratio of 1:1. Based on the training set, we built eight machine learning models and a multifactor logistic regression model, evaluating nine predictive models in total. After evaluating all models using the confusion matrix, calibration curves and decision curves, the LGB model had the best predictive performance, with an AUC of 0.969, a Youden index of 0.923, accuracy of 0.978, F1 score of 0.989, and Kappa value of 0.800. We further interpreted the LGB model using SHAP, the SHAP hive plot showed that the predictor variables were, in descending order of importance, BMI age sex-specific percentage, weight, upper arm circumference, thigh length, and race. Finally, we deployed it online for broader accessibility. The predictive models we developed and validated demonstrated high performance, making them suitable for large-scale, non-medical primary screening and early warning of adolescent Metabolic syndrome. The online deployment of the model allows for practical use in community and school settings, promoting early intervention and public health improvement.

摘要

青少年代谢综合征(Mets)是一个日益严重的公共卫生问题,与肥胖、高血压和胰岛素抵抗相关,会增加心血管疾病和心理健康问题的风险。早期发现和干预至关重要,但往往受到复杂诊断要求的阻碍。本研究旨在利用美国国家健康与营养检查调查(NHANES)数据开发一种预测模型,不包括生化指标,以提供一种简单、经济高效的工具,用于大规模非医疗筛查和青少年代谢综合征的早期预防。在排除诊断变量缺失的青少年后,该数据集通过2007 - 2016年的NHANES数据纳入了2459名青少年。我们使用套索回归和20折交叉验证来筛选具有最大预测价值的变量。数据集以7:3的比例分为训练集和验证集,并使用合成少数过采样技术(SMOTE)以1:1的比例扩展训练集。基于训练集,我们构建了八个机器学习模型和一个多因素逻辑回归模型,共评估九个预测模型。在使用混淆矩阵、校准曲线和决策曲线评估所有模型后,轻梯度提升(LGB)模型具有最佳预测性能,曲线下面积(AUC)为0.969,约登指数为0.923,准确率为0.978,F1分数为0.989,卡帕值为0.800。我们使用SHAP进一步解释LGB模型,SHAP蜂巢图显示预测变量按重要性降序排列为体重指数年龄性别特异性百分比、体重、上臂围、大腿长度和种族。最后,我们将其在线部署以提高可及性。我们开发并验证的预测模型表现出高性能,使其适用于大规模非医疗初级筛查和青少年代谢综合征的早期预警。该模型的在线部署允许在社区和学校环境中实际使用,促进早期干预和公共卫生改善。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e04e/11762282/14880d21c671/41598_2025_88156_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验