Suppr超能文献

关于在高血压成年人中建立基于机器学习的中风预测模型的新见解。

Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults.

作者信息

Huang Xiao, Cao Tianyu, Chen Liangziqian, Li Junpei, Tan Ziheng, Xu Benjamin, Xu Richard, Song Yun, Zhou Ziyi, Wang Zhuo, Wei Yaping, Zhang Yan, Li Jianping, Huo Yong, Qin Xianhui, Wu Yanqing, Wang Xiaobin, Wang Hong, Cheng Xiaoshu, Xu Xiping, Liu Lishun

机构信息

Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.

Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States.

出版信息

Front Cardiovasc Med. 2022 May 6;9:901240. doi: 10.3389/fcvm.2022.901240. eCollection 2022.

Abstract

BACKGROUND

Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis.

METHODS

The training set included 70% of data ( = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data ( = 6,211), and external validation was conducted using a nested case-control (NCC) dataset ( = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set.

RESULTS

The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance.

CONCLUSION

Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.

摘要

背景

中风是一项重大的全球健康负担,风险预测对于中风的一级预防至关重要。然而,对于分析中风风险的最佳预测模型仍存在不确定性。在本研究中,我们旨在使用机器学习确定中国高血压人群中最有效的中风预测方法,并建立一个用于未来分析的通用方法流程。

方法

训练集包括来自中国脑卒中一级预防试验(CSPPT)的70%的数据(n = 14491)。使用CSPPT其余30%的数据(n = 6211)进行内部验证,并使用嵌套病例对照(NCC)数据集(n = 2568)进行外部验证。主要结局是首次中风。对四种分析方法进行了处理和比较:逻辑回归(LR)、逐步逻辑回归(SLR)、极端梯度提升(XGBoost)和随机森林(RF)。分别分析了包含和排除实验室变量的人群特征数据。使用准确性、敏感性、特异性、kappa值和受试者操作特征曲线下面积(AUC)进行模型评估,其中AUC是最关注的指标。应用数据平衡技术,包括随机欠采样(RUS)和合成少数过采样技术(SMOTE)来处理这个不平衡的训练集。

结果

在应用RUS的包含实验室变量的RF模型中观察到最佳模型性能。与空模型(敏感性 = 0,特异性 = 100,平均AUC = 0.643)相比,数据平衡技术通过RUS提高了整体性能,在本研究中显示出更令人满意的效果(RUS:敏感性 = 63.9;特异性 = 53.7;平均AUC = 0.624)。添加实验室变量提高了分析方法的性能。所有结果在验证集中得到再次确认。通过性能最佳的分析方法确定了前10个重要变量。

结论

在测试的方法中,针对目标人群最有效的中风预测模型是应用RUS的RF。从本研究揭示的见解中,我们提供了构建基于机器学习的预测模型的通用框架。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验