Suppr超能文献

当代中国人群心血管疾病一级预防风险预测模型(1°P-CARDIAC):采用统计与机器学习混合方法的模型推导与验证

Primary prevention cardiovascular disease risk prediction model for contemporary Chinese (1°P-CARDIAC): Model derivation and validation using a hybrid statistical and machine-learning approach.

作者信息

Zhou Yekai, Lin Celia Jiaxi, Yu Qiuyan, Blais Joseph Edgar, Wan Eric Yuk Fai, Wong Emmanuel, Tan Kathryn, Siu David Chung-Wah, Yiu Kai Hang, Chan Esther Wai Yin, Yu Doris, Wong William, Lam Tak-Wah, Wong Ian Chi Kei, Luo Ruibang, Chui Celine S L

机构信息

School of Computing and Data Science, The University of Hong Kong, Hong Kong Special Administration Region, China.

Laboratory of Data Discovery for Health (D24H), Hong Kong Science Park, Hong Kong Science and Technology Park, Hong Kong Special Administration Region, China.

出版信息

PLoS One. 2025 Jul 28;20(7):e0322419. doi: 10.1371/journal.pone.0322419. eCollection 2025.

Abstract

BACKGROUND

Cardiovascular disease (CVD) is the leading cause of mortality and morbidity in China and worldwide while we are lacking in validated primary prevention model specifically for Chinese. To identify CVD high-risk individuals for early intervention, we created and validated a primary prevention risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (1°P-CARDIAC), in contemporary Chinese cohorts in Hong Kong.

METHODS

Patients without any history of CVD was categorized as derivation and validation cohorts based on their different geographical location of residence in Hong Kong. The outcome was the first diagnosis of a composite of coronary heart disease, ischemic or hemorrhagic stroke, peripheral artery disease, and revascularization. The full model incorporated all available variables in the dataset as clinical laboratory tests, disease and medication history, family history of disease, demographic factors, and healthcare utilization. We employed XGBoost Cox model and multivariate imputation with chained equation (MICE) for derivation and missing data replacement. A basic model was developed with the integration of statistically significant and important subset of risk variables by least absolute shrinkage and selection operator (LASSO) regression. Validation was performed by 1000 bootstrap replicates and compared to four existing models: PREDICT, pooled cohort equation (PCE), China-PAR, and Framingham (Asian).

RESULTS

The study included 179,953 patients in the derivation cohort and 1,083,924 patients across two independent validation cohorts. A total of 103 covariates were included in the full model whilst 8 covariates were included the basic model. It demonstrated good performance with C-statistic of 0.87 (95% CI: 0.87, 0.87), calibration slope of 0.94 in the full model. The C-statistic in the basic model was 0.75 (95% CI: 0.75, 0.75) with calibration slope of 0.91. Other comparison risk models have lower C statistic ranging from 0.68 to 0.72.

CONCLUSION

We developed and validated 1°P-CARDIAC, a CVD risk prediction model for primary prevention applying a novel hybrid statistical and machine-learning approach. Validation results suggest that it may offer improved performance compared to commonly used risk models. The 1°P-CARDIAC yields the similar level of accuracy and performance between basic and full model. It demonstrated both effectiveness and versatility in harnessing the power of big data and which has the potential to serve as a promising method for CVD primary prevention and improving public health outcome.

摘要

背景

心血管疾病(CVD)是中国乃至全球死亡和发病的主要原因,然而我们缺乏专门针对中国人的经过验证的一级预防模型。为了识别心血管疾病高危个体以便进行早期干预,我们在香港的当代中国人群中创建并验证了一种一级预防风险预测模型,即中国人个性化心血管疾病风险评估模型(1°P-CARDIAC)。

方法

没有任何心血管疾病病史的患者根据其在香港居住的不同地理位置被分为推导队列和验证队列。结局是首次诊断出冠心病、缺血性或出血性中风、外周动脉疾病以及血运重建的综合病症。完整模型纳入了数据集中所有可用变量,如临床实验室检查、疾病和用药史、疾病家族史、人口统计学因素以及医疗保健利用情况。我们采用XGBoost Cox模型和链式方程多元插补法(MICE)进行推导和缺失数据替换。通过最小绝对收缩和选择算子(LASSO)回归,整合具有统计学意义和重要性的风险变量子集,开发了一个基本模型。通过1000次自抽样重复进行验证,并与四个现有模型进行比较:PREDICT、合并队列方程(PCE)、China-PAR和弗雷明汉姆(亚洲版)。

结果

该研究在推导队列中纳入了179,953名患者,在两个独立验证队列中纳入了1,083,924名患者。完整模型共纳入103个协变量,而基本模型纳入了8个协变量。完整模型的C统计量为0.87(95%置信区间:0.87, 0.87),校准斜率为0.94,表现良好。基本模型的C统计量为0.75(95%置信区间:0.75, 0.75),校准斜率为0.91。其他比较风险模型的C统计量较低,范围在0.68至0.72之间。

结论

我们开发并验证了1°P-CARDIAC,这是一种采用新型混合统计和机器学习方法的用于一级预防的心血管疾病风险预测模型。验证结果表明,与常用风险模型相比,它可能具有更好的性能。1°P-CARDIAC在基本模型和完整模型之间产生了相似水平的准确性和性能。它在利用大数据的力量方面展现出有效性和通用性,有潜力成为心血管疾病一级预防和改善公共卫生结果的一种有前景的方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验