基于机器学习的心血管疾病预测模型：对韩国国民健康保险服务健康筛查数据库的队列研究

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.

作者信息

Kim Joung Ouk Ryan, Jeong Yong-Suk, Kim Jin Ho, Lee Jong-Weon, Park Dougho, Kim Hyoung-Seop

机构信息

Department of AI and Big Data, Swiss School of Management, 6500 Bellinzona, Switzerland.

Department of Cardiology, Brain and Vascular Center, Pohang Stroke and Spine Hospital, Pohang 37659, Korea.

出版信息

Diagnostics (Basel). 2021 May 25;11(6):943. doi: 10.3390/diagnostics11060943.

DOI:10.3390/diagnostics11060943

PMID:34070504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8229422/

Abstract

BACKGROUND

This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets.

METHODS

We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20-I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared.

RESULTS

The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index.

CONCLUSIONS

Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models.

摘要

背景

本研究基于国民健康保险服务健康筛查数据集，提出了一种使用机器学习（ML）算法的心血管疾病（CVD）预测模型。

方法

我们提取了4699名45岁以上的患者作为CVD组，根据国际疾病分类系统（I20 - I25）进行诊断。此外，招募了4699名未诊断出CVD的随机受试者作为非CVD组。两组按年龄和性别进行匹配。应用各种ML算法进行CVD预测；然后，比较所有预测模型的性能。

结果

在本研究验证的所有算法中，极端梯度提升、梯度提升和随机森林算法表现出最佳的平均预测准确性（受试者工作特征曲线下面积（AUROC）：分别为0.812、0.812和0.811）。基于AUROC，与先前提出的预测模型相比，ML算法提高了CVD预测性能。既往CVD病史是预测模型准确性的最重要因素，其次是总胆固醇、低密度脂蛋白胆固醇、腰高比和体重指数。

结论

我们的结果表明，所提出的基于健康筛查数据集的使用ML算法的CVD预测模型易于应用，产生经过验证的结果，并且优于先前的CVD预测模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cdde/8229422/6a45b77a46f6/diagnostics-11-00943-g001.jpg

相似文献

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.基于机器学习的心血管疾病预测模型：对韩国国民健康保险服务健康筛查数据库的队列研究

Diagnostics (Basel). 2021 May 25;11(6):943. doi: 10.3390/diagnostics11060943.

Machine learning based risk prediction for Parkinson's disease with nationwide health screening data.基于全国性健康筛查数据的帕金森病机器学习风险预测。

Sci Rep. 2022 Nov 14;12(1):19499. doi: 10.1038/s41598-022-24105-9.

Machine learning algorithms identify hypokalaemia risk in people with hypertension in the United States National Health and Nutrition Examination Survey 1999-2018.机器学习算法在美国国家健康与营养调查 1999-2018 中识别出高血压人群中的低钾血症风险。

Ann Med. 2023 Dec;55(1):2209336. doi: 10.1080/07853890.2023.2209336.

Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening.利用健康筛查中的常规临床数据预测肾功能下降风险的统计模型与机器学习方法的比较

Risk Manag Healthc Policy. 2022 Apr 26;15:817-826. doi: 10.2147/RMHP.S346856. eCollection 2022.

Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model.基于实验室数据的心血管疾病预测：随机森林模型分析。

J Clin Lab Anal. 2020 Sep;34(9):e23421. doi: 10.1002/jcla.23421. Epub 2020 Jul 29.

Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension.机器学习在心血管风险预测方面优于传统逻辑回归，并为其提供了新的可能性：一项涉及143,043名中国高血压患者的研究。

Front Cardiovasc Med. 2022 Nov 14;9:1025705. doi: 10.3389/fcvm.2022.1025705. eCollection 2022.

Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study.利用人工智能架构下的医疗理赔数据预测心血管疾病患者的死亡率：验证研究

JMIR Med Inform. 2021 Apr 1;9(4):e25000. doi: 10.2196/25000.

An evolutionary machine learning algorithm for cardiovascular disease risk prediction.一种用于心血管疾病风险预测的进化机器学习算法。

PLoS One. 2022 Jul 28;17(7):e0271723. doi: 10.1371/journal.pone.0271723. eCollection 2022.

Development and verification of prediction models for preventing cardiovascular diseases.心血管疾病预防预测模型的建立与验证。

PLoS One. 2019 Sep 19;14(9):e0222809. doi: 10.1371/journal.pone.0222809. eCollection 2019.

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models.使用机器学习模型预测长期中风复发

J Clin Med. 2021 Mar 20;10(6):1286. doi: 10.3390/jcm10061286.

引用本文的文献

Supervised Machine Learning Algorithms for Fitness-Based Cardiometabolic Risk Classification in Adolescents.用于青少年基于健康状况的心血管代谢风险分类的监督式机器学习算法

Sports (Basel). 2025 Aug 18;13(8):273. doi: 10.3390/sports13080273.

Atherosclerotic Cardiovascular Disease Risk Prediction Models in China, Japan, and Korea: Implications for East Asians?中国、日本和韩国的动脉粥样硬化性心血管疾病风险预测模型：对东亚人有何启示？

JACC Asia. 2025 Mar;5(3 Pt 1):333-349. doi: 10.1016/j.jacasi.2025.01.006.

Discovering Vitamin-D-Deficiency-Associated Factors in Korean Adults Using KNHANES Data Based on an Integrated Analysis of Machine Learning and Statistical Techniques.基于机器学习和统计技术的综合分析，利用韩国国家健康与营养检查调查（KNHANES）数据发现韩国成年人维生素D缺乏相关因素。

Nutrients. 2025 Feb 8;17(4):618. doi: 10.3390/nu17040618.

Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis.基于机器学习利用电子健康记录数据预测心血管疾病风险的模型：系统评价与荟萃分析

Eur Heart J Digit Health. 2024 Oct 27;6(1):7-22. doi: 10.1093/ehjdh/ztae080. eCollection 2025 Jan.

Chronic Disease Prediction Using the Common Data Model: Development Study.使用通用数据模型进行慢性病预测：发展研究

JMIR AI. 2022 Dec 22;1(1):e41030. doi: 10.2196/41030.

Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases.基于机器学习的心血管疾病检测预测模型

Diagnostics (Basel). 2024 Jan 8;14(2):144. doi: 10.3390/diagnostics14020144.

Survival Prediction Model for Patients with Hepatocellular Carcinoma and Extrahepatic Metastasis Based on XGBoost Algorithm.基于XGBoost算法的肝细胞癌合并肝外转移患者生存预测模型

J Hepatocell Carcinoma. 2023 Dec 13;10:2251-2263. doi: 10.2147/JHC.S429903. eCollection 2023.

Measures of socioeconomic advantage are not independent predictors of support for healthcare AI: subgroup analysis of a national Australian survey.社会经济优势的衡量指标并不能独立预测对医疗保健人工智能的支持：对澳大利亚全国性调查的亚组分析。

BMJ Health Care Inform. 2023 May;30(1). doi: 10.1136/bmjhci-2022-100714.

Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts.应用于纵向电子健康记录数据的风险预测模型在存在数据偏移情况下对主要心血管事件预测的验证

Eur Heart J Digit Health. 2022 Oct 21;3(4):535-547. doi: 10.1093/ehjdh/ztac061. eCollection 2022 Dec.

Economics of Artificial Intelligence in Healthcare: Diagnosis vs. Treatment.医疗保健领域人工智能的经济学：诊断与治疗

Healthcare (Basel). 2022 Dec 9;10(12):2493. doi: 10.3390/healthcare10122493.

本文引用的文献

Machine learning and atherosclerotic cardiovascular disease risk prediction in a multi-ethnic population.多民族人群中的机器学习与动脉粥样硬化性心血管疾病风险预测

NPJ Digit Med. 2020 Sep 23;3:125. doi: 10.1038/s41746-020-00331-1. eCollection 2020.

Development and validation of risk prediction models for multiple cardiovascular diseases and Type 2 diabetes.多种心血管疾病和 2 型糖尿病风险预测模型的开发和验证。

PLoS One. 2020 Jul 29;15(7):e0235758. doi: 10.1371/journal.pone.0235758. eCollection 2020.

A Comprehensive Machine-Learning Model Applied to Magnetic Resonance Imaging (MRI) to Predict Alzheimer's Disease (AD) in Older Subjects.一种应用于磁共振成像（MRI）以预测老年受试者阿尔茨海默病（AD）的综合机器学习模型。

J Clin Med. 2020 Jul 8;9(7):2146. doi: 10.3390/jcm9072146.

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis.机器学习预测模型在慢性病诊断中的应用。

J Pers Med. 2020 Mar 31;10(2):21. doi: 10.3390/jpm10020021.

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.使用自动化机器学习进行心血管疾病风险预测：对 423604 名英国生物库参与者的前瞻性研究。

PLoS One. 2019 May 15;14(5):e0213653. doi: 10.1371/journal.pone.0213653. eCollection 2019.

70-year legacy of the Framingham Heart Study.弗雷明汉心脏研究 70 年的历程。

Nat Rev Cardiol. 2019 Nov;16(11):687-698. doi: 10.1038/s41569-019-0202-5.

Nonlinear model for the carotid artery disease 10-year risk prediction by fusing conventional cardiovascular factors to carotid ultrasound image phenotypes: A Japanese diabetes cohort study.通过将传统心血管因素与颈动脉超声图像表型相结合建立的用于预测颈动脉疾病10年风险的非线性模型：一项日本糖尿病队列研究。

Echocardiography. 2019 Feb;36(2):345-361. doi: 10.1111/echo.14242. Epub 2019 Jan 9.

Cardiovascular disease prevalence and risk factor prevalence in Type 2 diabetes: a contemporary analysis.2 型糖尿病患者中心血管疾病患病率和风险因素患病率：一项当代分析。

Diabet Med. 2019 Jun;36(6):718-725. doi: 10.1111/dme.13825. Epub 2018 Oct 10.

Short-Term Global Cardiovascular Disease Risk Prediction in Older Adults.老年人短期全球心血管疾病风险预测。

J Am Coll Cardiol. 2018 Jun 5;71(22):2527-2536. doi: 10.1016/j.jacc.2018.02.050. Epub 2018 Mar 10.

A systematic approach to analyze the social determinants of cardiovascular disease.一种分析心血管疾病社会决定因素的系统方法。

PLoS One. 2018 Jan 25;13(1):e0190960. doi: 10.1371/journal.pone.0190960. eCollection 2018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于机器学习的心血管疾病预测模型：对韩国国民健康保险服务健康筛查数据库的队列研究

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献