Suppr超能文献

一种使用3年连续体检数据预测血脂异常的集成模型。

An ensemble model for predicting dyslipidemia using 3-years continuous physical examination data.

作者信息

Zhang Naiwen, Guo Xiaolong, Yu Xiaxia, Tan Zhen, Cai Feiyue, Dai Ping, Guo Jing, Dan Guo

机构信息

School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China.

Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China.

出版信息

Front Physiol. 2024 Oct 24;15:1464744. doi: 10.3389/fphys.2024.1464744. eCollection 2024.

Abstract

BACKGROUND

Dyslipidemia has emerged as a significant clinical risk, with its associated complications, including atherosclerosis and ischemic cerebrovascular disease, presenting a grave threat to human well-being. Hence, it holds paramount importance to precisely predict the onset of dyslipidemia. This study aims to use ensemble technology to establish a machine learning model for the prediction of dyslipidemia.

METHODS

This study included three consecutive years of physical examination data of 2,479 participants, and used the physical examination data of the first two years to predict whether the participants would develop dyslipidemia in the third year. Feature selection was conducted through statistical methods and the analysis of mutual information between features. Five machine learning models, including support vector machine (SVM), logistic regression (LR), random forest (RF), K nearest neighbor (KNN) and extreme gradient boosting (XGBoost), were utilized as base learners to construct the ensemble model. Area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA) were used to evaluate the model.

RESULTS

Experimental results show that the ensemble model achieves superior performance across several metrics, achieving an AUC of 0.88 ± 0.01 ( < 0.001), surpassing the base learners by margins of 0.04 to 0.20. Calibration curves and DCA exhibited good predictive performance as well. Furthermore, this study explores the minimal necessary feature set for accurate prediction, finding that just the top 12 features were required for dependable outcomes. Among them, HbA1c and CEA are key indicators for model construction.

CONCLUSIONS

Our results suggest that the proposed ensemble model has good predictive performance and has the potential to become an effective tool for personal health management.

摘要

背景

血脂异常已成为一项重大临床风险,其相关并发症,包括动脉粥样硬化和缺血性脑血管疾病,对人类健康构成严重威胁。因此,准确预测血脂异常的发病至关重要。本研究旨在使用集成技术建立一个用于预测血脂异常的机器学习模型。

方法

本研究纳入了2479名参与者连续三年的体检数据,并使用前两年的体检数据预测参与者在第三年是否会发生血脂异常。通过统计方法和特征间互信息分析进行特征选择。使用支持向量机(SVM)、逻辑回归(LR)、随机森林(RF)、K近邻(KNN)和极端梯度提升(XGBoost)这五个机器学习模型作为基学习器来构建集成模型。使用受试者工作特征曲线下面积(AUC)、校准曲线和决策曲线分析(DCA)来评估模型。

结果

实验结果表明,集成模型在多个指标上表现优异,AUC达到0.88±0.01(<0.001),比基学习器高出0.04至0.20。校准曲线和DCA也表现出良好的预测性能。此外,本研究探索了准确预测所需的最小必要特征集,发现仅前12个特征就能获得可靠结果。其中,糖化血红蛋白(HbA1c)和癌胚抗原(CEA)是模型构建的关键指标。

结论

我们的结果表明,所提出的集成模型具有良好的预测性能,有潜力成为个人健康管理的有效工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a30/11540663/b4195c9e6f47/fphys-15-1464744-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验