Suppr超能文献

用于预测颈动脉粥样硬化发生的堆叠集成模型。

A stacking ensemble model for predicting the occurrence of carotid atherosclerosis.

机构信息

Department of Data Science, School of Statistics and Mathematics, Shandong University of Finance and Economics, Jinan, China.

Information Technology Division, Shandong International Trust Co., Ltd., Jinan, China.

出版信息

Front Endocrinol (Lausanne). 2024 Jul 23;15:1390352. doi: 10.3389/fendo.2024.1390352. eCollection 2024.

Abstract

BACKGROUND

Carotid atherosclerosis (CAS) is a significant risk factor for cardio-cerebrovascular events. The objective of this study is to employ stacking ensemble machine learning techniques to enhance the prediction of CAS occurrence, incorporating a wide range of predictors, including endocrine-related markers.

METHODS

Based on data from a routine health check-up cohort, five individual prediction models for CAS were established based on logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and gradient boosting decision tree (GBDT) methods. Then, a stacking ensemble algorithm was used to integrate the base models to improve the prediction ability and address overfitting problems. Finally, the SHAP value method was applied for an in-depth analysis of variable importance at both the overall and individual levels, with a focus on elucidating the impact of endocrine-related variables.

RESULTS

A total of 441 of the 1669 subjects in the cohort were finally diagnosed with CAS. Seventeen variables were selected as predictors. The ensemble model outperformed the individual models, with AUCs of 0.893 in the testing set and 0.861 in the validation set. The ensemble model has the optimal accuracy, precision, recall and F1 score in the validation set, with considerable performance in the testing set. Carotid stenosis and age emerged as the most significant predictors, alongside notable contributions from endocrine-related factors.

CONCLUSION

The ensemble model shows enhanced accuracy and generalizability in predicting CAS risk, underscoring its utility in identifying individuals at high risk. This approach integrates a comprehensive analysis of predictors, including endocrine markers, affirming the critical role of endocrine dysfunctions in CAS development. It represents a promising tool in identifying high-risk individuals for the prevention of CAS and cardio-cerebrovascular diseases.

摘要

背景

颈动脉粥样硬化(CAS)是心脑血管事件的重要危险因素。本研究旨在运用堆叠集成机器学习技术提高 CAS 发生的预测能力,纳入广泛的预测因子,包括内分泌相关标志物。

方法

基于常规健康检查队列的数据,基于逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)、极端梯度提升(XGBoost)和梯度提升决策树(GBDT)方法,分别建立了 5 个 CAS 个体预测模型。然后,使用堆叠集成算法整合基础模型,以提高预测能力并解决过拟合问题。最后,应用 SHAP 值方法从整体和个体水平深入分析变量的重要性,重点阐明内分泌相关变量的影响。

结果

队列中共有 1669 例患者中的 441 例最终被诊断为 CAS。选择了 17 个变量作为预测因子。集成模型的表现优于个体模型,在测试集中的 AUC 为 0.893,在验证集中为 0.861。在验证集中,集成模型具有最佳的准确性、精度、召回率和 F1 分数,在测试集中也具有相当的性能。颈动脉狭窄和年龄是最重要的预测因子,同时内分泌相关因素也有显著贡献。

结论

集成模型在预测 CAS 风险方面表现出更高的准确性和泛化能力,突出了其在识别高危人群方面的应用价值。该方法综合分析了预测因子,包括内分泌标志物,证实了内分泌功能障碍在 CAS 发展中的关键作用。它是识别 CAS 和心脑血管疾病高危人群的一种有前途的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbaf/11300245/9e551f472206/fendo-15-1390352-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验