Suppr超能文献

人工智能驱动的美国成年人糖尿病风险决定因素分析:探索疾病患病率和健康因素。

AI-driven analysis of diabetes risk determinants in U.S. adults: Exploring disease prevalence and health factors.

作者信息

Majcherek Dawid, Ciesielski Antoni, Sobczak Paweł

机构信息

Department of International Management, Collegium of World Economy, SGH Warsaw School of Economics, Warsaw, Poland.

Technical Schools Complex named after Waldemar Gostomczyk in Ostrów Wielkopolski, Ostrów Wielkopolski, Poland.

出版信息

PLoS One. 2025 Sep 3;20(9):e0328655. doi: 10.1371/journal.pone.0328655. eCollection 2025.

Abstract

BACKGROUND

Diabetes remains a major public health concern in the United States, with a complex interplay of behavioral, demographic, and clinical risk factors. This study aims to identify the three best-performing machine learning models for diabetes risk prediction and to visualize the most influential predictors affecting diabetes likelihood. By leveraging a large, representative dataset, the study contributes to evidence-based strategies for targeted prevention.

METHODS

Data were obtained from the 2015 Behavioral Risk Factor Surveillance System (BRFSS), a nationally representative, population-based survey collecting information on health behaviors, chronic conditions, and preventive care. The analytical sample included 253,680 adult respondents and over twenty features encompassing sociodemographic variables (e.g., age, sex, race, income, education), health behaviors (e.g., smoking, physical activity, diet), and outcomes (e.g., BMI, hypertension, diabetes status). Eighteen machine learning models were trained and evaluated, including AdaBoost, Extra Trees Classifier, C5.0 Decision Tree, and CatBoost. Models were assessed using predictive accuracy and AUC scores. SHAP (SHapley Additive exPlanations) analysis was used to interpret the top model and examine how changes in key features influence diabetes risk.

RESULTS

Among the evaluated models, the Extra Trees Classifier achieved the highest predictive accuracy (>90%) and an AUC of 0.99. AdaBoost and CatBoost also demonstrated strong performance. Feature importance analysis identified BMI, age, general health status, income, physical health days, and education as the top predictors. A nonlinear association between income and diabetes risk was observed, with the highest prevalence in individuals earning $20,000-$25,000. Risk was also elevated in individuals aged 65-69 and those reporting poor general health. Hypertension showed a strong positive correlation with diabetes risk.

CONCLUSIONS

Machine learning models, particularly tree-based ensemble methods, offer robust tools for diabetes risk prediction. These findings support their integration into public health analytics for personalized risk assessment and data-driven prevention strategies.

摘要

背景

在美国,糖尿病仍然是一个主要的公共卫生问题,行为、人口统计学和临床风险因素之间存在复杂的相互作用。本研究旨在确定用于糖尿病风险预测的三种性能最佳的机器学习模型,并可视化影响糖尿病可能性的最具影响力的预测因素。通过利用一个大型的代表性数据集,该研究为有针对性预防的循证策略做出了贡献。

方法

数据来自2015年行为风险因素监测系统(BRFSS),这是一项具有全国代表性的基于人群的调查,收集有关健康行为、慢性病和预防保健的信息。分析样本包括253,680名成年受访者以及二十多个特征,涵盖社会人口统计学变量(如年龄、性别、种族、收入、教育程度)、健康行为(如吸烟、体育活动、饮食)和结果(如体重指数、高血压、糖尿病状态)。训练并评估了18种机器学习模型,包括AdaBoost、极端随机树分类器、C5.0决策树和CatBoost。使用预测准确性和AUC分数评估模型。采用SHAP(Shapley值加法解释)分析来解释顶级模型,并研究关键特征的变化如何影响糖尿病风险。

结果

在评估的模型中,极端随机树分类器实现了最高的预测准确性(>90%),AUC为0.99。AdaBoost和CatBoost也表现出强劲的性能。特征重要性分析确定体重指数、年龄、总体健康状况、收入、身体健康天数和教育程度为顶级预测因素。观察到收入与糖尿病风险之间存在非线性关联,收入在20,000美元至25,000美元的个体中患病率最高。65至69岁的个体以及报告总体健康状况较差的个体风险也有所升高。高血压与糖尿病风险呈强正相关。

结论

机器学习模型,特别是基于树的集成方法,为糖尿病风险预测提供了强大的工具。这些发现支持将其整合到公共卫生分析中,以进行个性化风险评估和数据驱动的预防策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1cb/12407459/1d80c79e464a/pone.0328655.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验