Suppr超能文献

基于机器学习的2型糖尿病患者冠心病诊断模型

Machine learning-based coronary heart disease diagnosis model for type 2 diabetes patients.

作者信息

Chen Yingxi, Wang Chunyu, Liu Xiaozhu, Duan Minjie, Xiang Tianyu, Huang Haodong

机构信息

Department of Anatomy, Institute of Neuroscience, Chongqing Medical University Basic Medical College, Chongqing, China.

Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, China.

出版信息

Front Endocrinol (Lausanne). 2025 May 22;16:1550793. doi: 10.3389/fendo.2025.1550793. eCollection 2025.

Abstract

BACKGROUND

To establish a classification model for assisting the diagnosis of type 2 diabetes mellitus (T2DM) complicated with coronary heart disease (CHD).

METHODS

Patients with T2DM who underwent coronary angiography (CA) were enrolled from seven affiliated hospitals of Chongqing Medical University. Statistical differences in clinical variables between T2DM with or without CHD patients were verified using univariate analysis. The original data was divided into a training set and a validation set in a 7:3 ratio. The training set data were used to screen features using Logistic regression, Lasso regression, or recursive feature elimination (RFE). Five machine learning algorithms, including Logistic regression, Support Vector Machine (SVM), Random Forest (RF), eXtreme gradient boosting (XgBoost), and Light Gradient Boosting Machine (LightGBM), were selected for modeling. The performance of the models was verified through 5-fold cross-validation and the training set.

RESULTS

Clinical data were collected from 1943 patients with T2DM complicated with CHD and 574 T2DM patients without CHD. Univariate analysis identified 20 optimal risk factors, four of the risk factors had over 30% missing values, we ultimately included 16 risk factors. Logistic regression screened eight features, Lasso regression screened ten features, the RFE method screened eight, fourteen, sixteen, and thirteen features for SVM, RF, XgBoost, and LightGBM, respectively. Among all models, the XgBoost model based on features selected by RFE+LightGBM demonstrated the best performance, achieving an AUC of 0.814 (95% CI, 0.779-0.847), accuracy of 0.799 (95% CI, 0.771-0.827), precision of 0.841 (95% CI, 0.812-0.868), recall of 0.920 (95% CI, 0.898-0.941), and F1-score of 0.879 (95% CI, 0.859-0.897) in the testing set.

CONCLUSIONS

Based on T2DM data and machine learning theory, a Bayesian-optimized XgBoost model was established using the RFE+LightGBM method. This model effectively determines whether T2DM patients have CHD.

摘要

背景

建立一个辅助诊断2型糖尿病(T2DM)合并冠心病(CHD)的分类模型。

方法

从重庆医科大学的七家附属医院招募接受冠状动脉造影(CA)的T2DM患者。采用单因素分析验证T2DM合并CHD患者与未合并CHD患者临床变量的统计学差异。将原始数据按7:3的比例分为训练集和验证集。利用逻辑回归、套索回归或递归特征消除(RFE)方法对训练集数据进行特征筛选。选择逻辑回归、支持向量机(SVM)、随机森林(RF)、极端梯度提升(XgBoost)和轻量级梯度提升机(LightGBM)五种机器学习算法进行建模。通过五折交叉验证和训练集对模型性能进行验证。

结果

收集了1943例T2DM合并CHD患者和574例未合并CHD的T2DM患者的临床数据。单因素分析确定了20个最佳风险因素,其中4个风险因素缺失值超过30%,最终纳入16个风险因素。逻辑回归筛选出8个特征,套索回归筛选出10个特征,RFE方法分别为SVM、RF、XgBoost和LightGBM筛选出8个、14个、16个和13个特征。在所有模型中,基于RFE+LightGBM选择特征的XgBoost模型表现最佳,在测试集中的AUC为0.814(95%CI,0.779-0.847),准确率为0.799(95%CI,0.771-0.827),精确率为0.841(95%CI,0.812-0.868),召回率为0.920(95%CI,0.898-0.941),F1分数为0.879(95%CI,0.859-0.897)。

结论

基于T2DM数据和机器学习理论,采用RFE+LightGBM方法建立了贝叶斯优化的XgBoost模型。该模型能有效判断T2DM患者是否合并CHD。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1a9/12137098/97207752a1fb/fendo-16-1550793-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验