• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

解读中国中老年人群的心血管疾病风险:一项基于中国健康与养老追踪调查(CHARLS)队列、将机器学习与可解释人工智能相结合的9年前瞻性研究。

Decoding cardiovascular risk in Chinese middle-aged and elderly adults: a 9-year prospective study integrating machine learning with explainable AI based on CHARLS cohort.

作者信息

Zhu Xing-Yu, Li Wei, Yuan Guo-Liang, Pan Xu-Yang

机构信息

Department of Cardiovascular Medicine, Shu yang Hospital of Traditional Chinese Medicine, Shu Yang, Jiangsu Province, 223600, China.

出版信息

BMC Med Inform Decis Mak. 2026 Feb 26. doi: 10.1186/s12911-026-03389-1.

DOI:10.1186/s12911-026-03389-1
PMID:41749231
Abstract

BACKGROUND

Cardiovascular disease constitutes the most formidable public health challenge in China, accounting for 48.98% and 47.35% of mortality in rural and urban populations, respectively, affecting approximately 330 million individuals. Existing risk stratification models predominantly derive from Western populations, with the Framingham Risk Equation systematically overestimating cardiovascular risk by 276% in Chinese men and 102% in Chinese women, underscoring the critical imperative for population-specific predictive instruments. Although machine learning methodologies demonstrate considerable promise in cardiovascular risk prognostication, their inherent "black-box" characteristics substantially impede clinical translational implementation.

OBJECTIVE

Leveraging longitudinal cohort data from the China Health and Retirement Longitudinal Study (CHARLS) and integrating machine learning with explainable artificial intelligence techniques, we sought to develop and validate a cardiovascular disease long-term risk prediction model tailored to the Chinese middle-aged and elderly population, achieving optimal synthesis of predictive accuracy and clinical interpretability through quantitative risk factor contribution analysis.

METHODS

We incorporated four waves of CHARLS surveillance data spanning 2011-2020, with 8,080 participants aged ≥ 45 years completing 9-year follow-up after rigorous inclusion criteria application. Recursive feature elimination was employed to identify optimal predictors from 90 candidate variables. We systematically evaluated 12 machine learning algorithms encompassing linear, non-linear, ensemble learning, and deep learning methodologies, utilizing stratified random 7:3 partitioning for training and validation cohorts. SHAP (SHapley Additive exPlanations) methodology facilitated comprehensive global and local interpretability analyses, with decision curve analysis assessing clinical net benefit.

RESULTS

Among 5,699 training cohort participants, 1,248 (21.9%) experienced cardiovascular events during follow-up. Recursive feature elimination identified 18 pivotal predictive factors spanning lipid metabolism, anthropometric parameters, renal function, and glucose homeostasis domains. The gradient boosting machine demonstrated superior comprehensive performance, achieving validation cohort AUC of 0.798 (95% CI: 0.776-0.820), specificity of 98%, and positive predictive value of 78%. SHAP analysis revealed waist circumference, triglycerides, and hypertension history as the three predominant predictive factors, with mean absolute SHAP values significantly exceeding other variables. Individual risk attribution analysis demonstrated substantial heterogeneity: extremely high-risk specimens (predicted probability 0.991) exhibited synergistic multi-factorial risk amplification, with standardized waist circumference contributing + 0.0778 SHAP value and triglycerides (477 mg/dL) contributing + 0.0729; conversely, low-risk specimens (predicted probability - 0.0393) demonstrated triglycerides (45.1 mg/dL) providing the maximal singular protective contribution of -0.166. Decision curve analysis confirmed positive net benefit across the 0-0.95 threshold probability spectrum, systematically surpassing conventional strategies.

CONCLUSIONS

The gradient boosting machine model achieved superior discrimination (AUC 0.798, 95% CI 0.785-0.825) compared to Framingham (0.638) and China-PAR (0.654) scores for 9-year cardiovascular disease prediction in Chinese adults aged ≥ 45 years. Waist circumference, triglycerides, and hypertension emerged as principal predictive features, though SHAP-derived importance reflects statistical contribution rather than causal effects. Decision curve analysis demonstrated clinical utility across threshold probabilities 0.05-0.95, enabling flexible deployment from population screening (98.3% sensitivity) to targeted intervention (98.7% specificity). External validation in independent cohorts is essential to establish generalizability before clinical implementation.

CLINICAL TRIAL NUMBER

Not applicable.

摘要

背景

心血管疾病是中国最严峻的公共卫生挑战,分别占农村和城市人口死亡率的48.98%和47.35%,影响约3.3亿人。现有的风险分层模型主要源自西方人群,弗雷明汉风险方程在中国男性和女性中分别系统性地高估心血管风险276%和102%,凸显了针对特定人群的预测工具的迫切需求。尽管机器学习方法在心血管风险预测方面显示出巨大潜力,但其固有的“黑箱”特性严重阻碍了临床转化应用。

目的

利用中国健康与养老追踪调查(CHARLS)的纵向队列数据,并将机器学习与可解释人工智能技术相结合,我们旨在开发并验证一个针对中国中老年人群的心血管疾病长期风险预测模型,通过定量风险因素贡献分析实现预测准确性和临床可解释性的最佳综合。

方法

我们纳入了2011 - 2020年CHARLS的四轮监测数据,8080名年龄≥45岁的参与者在严格应用纳入标准后完成了9年随访。采用递归特征消除法从90个候选变量中识别最佳预测因子。我们系统评估了12种机器学习算法,包括线性、非线性、集成学习和深度学习方法,利用分层随机7:3划分训练和验证队列。SHAP(Shapley值加法解释)方法促进了全面的全局和局部可解释性分析,决策曲线分析评估临床净效益。

结果

在5699名训练队列参与者中,1248名(21.9%)在随访期间发生心血管事件。递归特征消除法确定了18个关键预测因素,涵盖脂质代谢、人体测量参数、肾功能和葡萄糖稳态领域。梯度提升机表现出卓越的综合性能,验证队列的AUC为0.798(95%CI:0.776 - 0.820),特异性为98%,阳性预测值为78%。SHAP分析显示腰围、甘油三酯和高血压病史是三个主要预测因素,平均绝对SHAP值显著超过其他变量。个体风险归因分析显示出显著的异质性:极高风险样本(预测概率0.991)表现出协同多因素风险放大,标准化腰围贡献+0.0778的SHAP值,甘油三酯(477mg/dL)贡献+0.0729;相反,低风险样本(预测概率 - 0.0393)显示甘油三酯(45.1mg/dL)提供最大的单一保护贡献 - 0.166。决策曲线分析证实了在0 - 0.95阈值概率范围内的正净效益,系统地超过了传统策略。

结论

与弗雷明汉(0.638)和中国PAR(0.654)评分相比,梯度提升机模型在预测≥45岁中国成年人9年心血管疾病方面具有更好的辨别力(AUC 0.798,95%CI 0.785 - 0.825)。腰围、甘油三酯和高血压成为主要预测特征,尽管SHAP衍生的重要性反映的是统计贡献而非因果效应。决策曲线分析证明了在阈值概率0.05 - 0.95范围内的临床实用性,能够灵活应用于从人群筛查(敏感性98.3%)到靶向干预(特异性98.7%)。在临床应用前,独立队列的外部验证对于确立普遍性至关重要。

临床试验编号

不适用。

相似文献

1
Decoding cardiovascular risk in Chinese middle-aged and elderly adults: a 9-year prospective study integrating machine learning with explainable AI based on CHARLS cohort.解读中国中老年人群的心血管疾病风险:一项基于中国健康与养老追踪调查(CHARLS)队列、将机器学习与可解释人工智能相结合的9年前瞻性研究。
BMC Med Inform Decis Mak. 2026 Feb 26. doi: 10.1186/s12911-026-03389-1.
2
Interpretable Machine Learning Framework for Diabetes Prediction: Integrating SMOTE Balancing with SHAP Explainability for Clinical Decision Support.用于糖尿病预测的可解释机器学习框架:将SMOTE平衡与SHAP可解释性相结合以支持临床决策
Healthcare (Basel). 2025 Oct 14;13(20):2588. doi: 10.3390/healthcare13202588.
3
A Multicohort Machine Learning Framework to Predict Mortality in Elderly Patients With Heart Disease: Insights From HARLS, SHARE, and HRS.
Cardiovasc Ther. 2026 Jan 2;2026:8040700. doi: 10.1155/cdr/8040700. eCollection 2026.
4
Development of a 5-Year Risk Prediction Model for Transition From Prediabetes to Diabetes Using Machine Learning: Retrospective Cohort Study.使用机器学习开发一个用于预测糖尿病前期转变为糖尿病的5年风险预测模型:回顾性队列研究。
J Med Internet Res. 2025 May 9;27:e73190. doi: 10.2196/73190.
5
Characterisation of cardiovascular disease (CVD) incidence and machine learning risk prediction in middle-aged and elderly populations: data from the China health and retirement longitudinal study (CHARLS).中老年人群心血管疾病(CVD)发病率及机器学习风险预测的特征分析:来自中国健康与养老追踪调查(CHARLS)的数据
BMC Public Health. 2025 Feb 7;25(1):518. doi: 10.1186/s12889-025-21609-7.
6
Transcultural prediction model for late-life depression based on multi-cohort machine learning and explainable AI.基于多队列机器学习和可解释人工智能的晚年抑郁症跨文化预测模型。
J Affect Disord. 2026 Jan 1;392:120169. doi: 10.1016/j.jad.2025.120169. Epub 2025 Aug 27.
7
Prediction of Moderate-to-Severe Sepsis-Associated Acute Kidney Injury Using a Dual-Timepoint Machine Learning Model: Development, Multiregional Validation, and Clinical Deployment Study.使用双时间点机器学习模型预测中重度脓毒症相关性急性肾损伤:开发、多区域验证及临床应用研究
J Med Internet Res. 2025 Sep 30;27:e73840. doi: 10.2196/73840.
8
Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: development and clinical application of a machine learning model.使用LightGBM和SHAP对结核性胸腔积液进行可解释的无创诊断:机器学习模型的开发与临床应用
PeerJ. 2025 May 20;13:e19411. doi: 10.7717/peerj.19411. eCollection 2025.
9
Machine Learning Algorithms to Predict Venous Thromboembolism in Patients With Sepsis in the Intensive Care Unit: Multicenter Retrospective Study.
JMIR Med Inform. 2026 Jan 30;14:e80969. doi: 10.2196/80969.
10
Predicting cardiovascular risk across cardiovascular-kidney-metabolic syndrome stages in middle-aged and older Chinese adults: An interpretable machine learning analysis.
Digit Health. 2026 Feb 24;12:20552076261427850. doi: 10.1177/20552076261427850. eCollection 2026 Jan-Dec.