Liu Tianyi, Krentz Andrew J, Huo Zhiqiang, Ćurčin Vasa
School of Life Course & Population Sciences, King's College London, SE1 1UL London, UK.
Metadvice, 1025 St-Sulpice, Switzerland.
Rev Cardiovasc Med. 2025 Apr 25;26(4):37443. doi: 10.31083/RCM37443. eCollection 2025 Apr.
Cardiovascular disease (CVD) remains the foremost cause of morbidity and mortality worldwide. Recent advancements in machine learning (ML) have demonstrated substantial potential in augmenting risk stratification for primary prevention, surpassing conventional statistical models in predictive performance. Thus, integrating ML with Electronic Health Records (EHRs) enables refined risk estimation by leveraging the granularity and breadth of longitudinal individual patient data. However, fundamental barriers persist, including limited generalizability, challenges in interpretability, and the absence of rigorous external validation, all of which impede widespread clinical deployment.
This review adheres to the methodological rigor of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. A systematic literature search was performed in March 2024, encompassing the Medline and Embase databases, to identify studies published since 2010. Supplementary references were retrieved from the Institute for Scientific Information (ISI) Web of Science, and manual searches were curated. The selection process, conducted via Rayyan, focused on systematic and narrative reviews evaluating ML-driven models for long-term CVD risk prediction within primary prevention contexts utilizing EHR data. Studies investigating short-term prognostication, highly specific comorbid cohorts, or conventional models devoid of ML components were excluded.
Following an exhaustive screening of 1757 records, 22 studies met the inclusion criteria. Of these, 10 were systematic reviews (four incorporating meta-analyses), while 12 constituted narrative reviews, with the majority published post-2020. The synthesis underscores the superiority of ML in modeling intricate EHR-derived risk factors, facilitating precision-driven cardiovascular risk assessment. Nonetheless, salient challenges endure heterogeneity in CVD outcome definitions, undermine comparability, data incompleteness and inconsistency compromise model robustness, and a dearth of external validation constrains clinical translatability. Moreover, ethical and regulatory considerations, including algorithmic opacity, equity in predictive performance, and the absence of standardized evaluation frameworks, pose formidable obstacles to seamless integration into clinical workflows.
Despite the transformative potential of ML-based CVD risk prediction, it remains encumbered by methodological, technical, and regulatory impediments that hinder its full-scale adoption into real-world healthcare settings. This review underscores the imperative circumstances for standardized validation protocols, stringent regulatory oversight, and interdisciplinary collaboration to bridge the translational divide. Our findings established an integrative framework for developing, validating, and applying ML-based CVD risk prediction algorithms, addressing both clinical and technical dimensions. To further advance this field, we propose a standardized, transparent, and regulated EHR platform that facilitates fair model evaluation, reproducibility, and clinical translation by providing a high-quality, representative dataset with structured governance and benchmarking mechanisms. Meanwhile, future endeavors must prioritize enhancing model transparency, mitigating biases, and ensuring adaptability to heterogeneous clinical populations, fostering equitable and evidence-based implementation of ML-driven predictive analytics in cardiovascular medicine.
心血管疾病(CVD)仍是全球发病和死亡的首要原因。机器学习(ML)的最新进展已显示出在加强一级预防风险分层方面的巨大潜力,其预测性能超过了传统统计模型。因此,将ML与电子健康记录(EHR)相结合,能够通过利用个体患者纵向数据的粒度和广度实现更精确的风险估计。然而,一些基本障碍仍然存在,包括可推广性有限、可解释性方面的挑战以及缺乏严格的外部验证,所有这些都阻碍了其在临床中的广泛应用。
本综述遵循系统评价和Meta分析的首选报告项目(PRISMA)以及叙述性综述文章评估量表(SANRA)指南的方法严谨性。2024年3月进行了系统的文献检索,涵盖Medline和Embase数据库,以识别2010年以来发表的研究。从科学信息研究所(ISI)的科学网检索了补充参考文献,并进行了手动检索。通过Rayyan进行的筛选过程侧重于评估在一级预防背景下利用EHR数据进行长期CVD风险预测的ML驱动模型的系统评价和叙述性综述。排除了研究短期预后、高度特异性合并症队列或缺乏ML组件的传统模型的研究。
在对1757条记录进行详尽筛选后,22项研究符合纳入标准。其中,10项为系统评价(4项纳入了Meta分析),12项为叙述性综述,大多数研究发表于2020年之后。综合分析强调了ML在对复杂的EHR衍生风险因素进行建模方面的优越性,有助于进行精准驱动的心血管风险评估。尽管如此,仍存在一些突出挑战,CVD结局定义的异质性破坏了可比性,数据不完整和不一致影响了模型的稳健性,缺乏外部验证限制了临床可转化性。此外,伦理和监管方面的考虑,包括算法不透明、预测性能的公平性以及缺乏标准化评估框架,对无缝融入临床工作流程构成了巨大障碍。
尽管基于ML的CVD风险预测具有变革潜力,但它仍然受到方法、技术和监管方面的障碍的阻碍,这些障碍阻碍了其在现实世界医疗环境中的全面应用。本综述强调了标准化验证方案、严格监管监督和跨学科合作以弥合转化差距的迫切情况。我们的研究结果建立了一个用于开发、验证和应用基于ML的CVD风险预测算法的综合框架,涵盖了临床和技术层面。为了进一步推动该领域的发展,我们提出了一个标准化、透明且受监管的EHR平台,通过提供具有结构化治理和基准机制的高质量、代表性数据集,促进公平的模型评估、可重复性和临床转化。同时,未来的努力必须优先提高模型透明度、减轻偏差并确保对异质临床人群的适应性,促进ML驱动的预测分析在心血管医学中的公平和循证实施。