应用于纵向电子健康记录数据的风险预测模型在存在数据偏移情况下对主要心血管事件预测的验证

Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts.

作者信息

Li Yikuan, Salimi-Khorshidi Gholamreza, Rao Shishir, Canoy Dexter, Hassaine Abdelaali, Lukasiewicz Thomas, Rahimi Kazem, Mamouei Mohammad

机构信息

Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK.

Nuffield Department of Women's and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK.

出版信息

Eur Heart J Digit Health. 2022 Oct 21;3(4):535-547. doi: 10.1093/ehjdh/ztac061. eCollection 2022 Dec.

DOI:10.1093/ehjdh/ztac061

PMID:36710898

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9779795/

Abstract

AIMS

Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models.

METHODS AND RESULTS

Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve.

CONCLUSION

The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.

摘要

目的

深度学习在不同领域的预测建模中占据主导地位，但在医学领域，其接受度却参差不齐。在临床实践中，简单的统计模型和风险评分仍用于心血管疾病风险预测。部分原因在于，对于深度学习模型在面对动态数据变化时的实际表现，存在知识空白；而这是常见的内部验证程序未涉及的关键标准。我们评估了一种新型深度学习模型BEHRT在数据变化情况下的性能，并将其与几种基于机器学习的既定风险模型进行比较。

方法与结果

利用1985年至2015年间英格兰110万年龄至少35岁患者的关联电子健康记录，我们复制了三种既定的统计模型，用于预测心力衰竭、中风和冠心病的5年发病风险。将结果与一个广泛接受的机器学习模型（随机森林）和一种新型深度学习模型（BEHRT）进行比较。除了内部验证，我们还研究了数据变化如何影响模型的辨别力和校准。为此，我们在以下队列上测试模型：（i）不同地理区域；（ii）不同时期。通过内部验证，在接受者操作特征曲线下面积方面，深度学习模型在心力衰竭、中风和冠心病预测中分别比最佳统计模型显著高出6%、8%和11%。

结论

由于数据变化，所有模型的性能均有所下降；尽管如此，深度学习模型在所有风险预测任务中仍保持最佳性能。用最新信息更新模型可提高辨别力，但如果先验分布发生变化，模型可能仍存在校准错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7cc/9779795/632aa06a678e/ztac061ga1.jpg

相似文献

Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts.应用于纵向电子健康记录数据的风险预测模型在存在数据偏移情况下对主要心血管事件预测的验证

Eur Heart J Digit Health. 2022 Oct 21;3(4):535-547. doi: 10.1093/ehjdh/ztac061. eCollection 2022 Dec.

Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records.使用机器学习预测急诊入院风险：基于电子健康记录的开发和验证。

PLoS Med. 2018 Nov 20;15(11):e1002695. doi: 10.1371/journal.pmed.1002695. eCollection 2018 Nov.

Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longitudinal Electronic Health Records.Hi-BEHRT：基于分层转换器的模型，用于使用多模态纵向电子健康记录准确预测临床事件。

IEEE J Biomed Health Inform. 2023 Feb;27(2):1106-1117. doi: 10.1109/JBHI.2022.3224727. Epub 2023 Feb 3.

Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.使用机器学习算法开发并验证针对20岁及以上抑郁症患者冠心病风险的预测模型。

Front Cardiovasc Med. 2025 Jan 9;11:1504957. doi: 10.3389/fcvm.2024.1504957. eCollection 2024.

An Explainable Transformer-Based Deep Learning Model for the Prediction of Incident Heart Failure.基于可解释的 Transformer 的深度学习模型预测心力衰竭事件。

IEEE J Biomed Health Inform. 2022 Jul;26(7):3362-3372. doi: 10.1109/JBHI.2022.3148820. Epub 2022 Jul 1.

Long-Term Exposure to Elevated Systolic Blood Pressure in Predicting Incident Cardiovascular Disease: Evidence From Large-Scale Routine Electronic Health Records.长期暴露于升高的收缩压预测心血管疾病事件：来自大规模常规电子健康记录的证据。

J Am Heart Assoc. 2019 Jun 18;8(12):e012129. doi: 10.1161/JAHA.119.012129. Epub 2019 Jun 5.

Improving the Prognostic Evaluation Precision of Hospital Outcomes for Heart Failure Using Admission Notes and Clinical Tabular Data: Multimodal Deep Learning Model.利用入院记录和临床表格数据提高心力衰竭住院结局预后评估的精准度：多模态深度学习模型。

J Med Internet Res. 2024 May 2;26:e54363. doi: 10.2196/54363.

Machine-learning Models Predict 30-Day Mortality, Cardiovascular Complications, and Respiratory Complications After Aseptic Revision Total Joint Arthroplasty.机器学习模型预测无菌翻修全关节置换术后 30 天死亡率、心血管并发症和呼吸系统并发症。

Clin Orthop Relat Res. 2022 Nov 1;480(11):2137-2145. doi: 10.1097/CORR.0000000000002276. Epub 2022 Jun 20.

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病：模型开发与性能评估

JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.

Prediction of prognosis in acute ischemic stroke after mechanical thrombectomy based on multimodal MRI radiomics and deep learning.基于多模态MRI影像组学和深度学习预测机械取栓术后急性缺血性卒中的预后

Front Neurol. 2025 Apr 30;16:1587347. doi: 10.3389/fneur.2025.1587347. eCollection 2025.

引用本文的文献

Enhancing Patient Outcome Prediction Through Deep Learning With Sequential Diagnosis Codes From Structured Electronic Health Record Data: Systematic Review.通过深度学习利用结构化电子健康记录数据中的顺序诊断代码增强患者预后预测：系统评价

J Med Internet Res. 2025 Mar 18;27:e57358. doi: 10.2196/57358.

Harnessing Electronic Health Records and Artificial Intelligence for Enhanced Cardiovascular Risk Prediction: A Comprehensive Review.利用电子健康记录和人工智能增强心血管疾病风险预测：一项全面综述。

J Am Heart Assoc. 2025 Mar 18;14(6):e036946. doi: 10.1161/JAHA.124.036946. Epub 2025 Mar 13.

Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis.基于机器学习利用电子健康记录数据预测心血管疾病风险的模型：系统评价与荟萃分析

Eur Heart J Digit Health. 2024 Oct 27;6(1):7-22. doi: 10.1093/ehjdh/ztae080. eCollection 2025 Jan.

Machine Learning-Based Prediction of Readmission Risk in Cardiovascular and Cerebrovascular Conditions Using Patient EMR Data.基于机器学习利用患者电子病历数据预测心血管和脑血管疾病再入院风险

Healthcare (Basel). 2024 Jul 28;12(15):1497. doi: 10.3390/healthcare12151497.

Deployment and validation of the CLL treatment infection model adjoined to an EHR system.与电子健康记录系统相连的慢性淋巴细胞白血病治疗感染模型的部署与验证。

NPJ Digit Med. 2024 Jun 5;7(1):147. doi: 10.1038/s41746-024-01132-6.

Adopting artificial intelligence in cardiovascular medicine: a scoping review.采用人工智能在心血管医学中的应用：范围综述。

Hypertens Res. 2024 Mar;47(3):685-699. doi: 10.1038/s41440-023-01469-7. Epub 2023 Oct 31.

A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank.英国生物银行中心血管疾病风险预测模型开发中以模型为中心和以数据为中心方法的比较研究。

Eur Heart J Digit Health. 2023 May 15;4(4):337-346. doi: 10.1093/ehjdh/ztad033. eCollection 2023 Aug.

本文引用的文献

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.类别不平衡校正对风险预测模型的危害：使用逻辑回归进行说明和模拟。

J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093.

Cardiovascular risk prediction in type 2 diabetes: a comparison of 22 risk scores in primary care settings.2 型糖尿病心血管风险预测：初级保健环境中 22 种风险评分的比较。

Diabetologia. 2022 Apr;65(4):644-656. doi: 10.1007/s00125-021-05640-y. Epub 2022 Jan 15.

Preventing dataset shift from breaking machine-learning biomarkers.防止数据集转移导致机器学习生物标志物失效。

Gigascience. 2021 Sep 28;10(9). doi: 10.1093/gigascience/giab055.

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.基于机器学习的心血管疾病预测模型：对韩国国民健康保险服务健康筛查数据库的队列研究

Diagnostics (Basel). 2021 May 25;11(6):943. doi: 10.3390/diagnostics11060943.

Pre-existing and machine learning-based models for cardiovascular risk prediction.基于既有数据和机器学习的心血管风险预测模型。

Sci Rep. 2021 Apr 26;11(1):8886. doi: 10.1038/s41598-021-88257-w.

Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction.利用机器学习模型预测急性心肌梗死后的死亡。

JAMA Cardiol. 2021 Jun 1;6(6):633-641. doi: 10.1001/jamacardio.2021.0122.

Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar.多种机器学习和统计模型在预测个体患者临床风险方面的一致性：以心血管疾病为例的纵向队列研究

BMJ. 2020 Nov 4;371:m3919. doi: 10.1136/bmj.m3919.

Performance of the Framingham coronary heart disease risk score for predicting 10-year cardiac risk in adult United Arab Emirates nationals without diabetes: a retrospective cohort study.弗拉明汉冠心病风险评分在预测无糖尿病的阿联酋成年国民 10 年心脏风险中的表现：一项回顾性队列研究。

BMC Fam Pract. 2020 Aug 26;21(1):175. doi: 10.1186/s12875-020-01246-2.

EHRtemporalVariability: delineating temporal data-set shifts in electronic health records.电子健康记录中的时间变化：描述时间数据集的变化。

Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa079.

BEHRT: Transformer for Electronic Health Records.BEHRT：电子健康记录的转换器。

Sci Rep. 2020 Apr 28;10(1):7155. doi: 10.1038/s41598-020-62922-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

应用于纵向电子健康记录数据的风险预测模型在存在数据偏移情况下对主要心血管事件预测的验证

Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts.

作者信息

机构信息

出版信息

AIMS

METHODS AND RESULTS

CONCLUSION

目的

方法与结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献