Kim Joung Ouk Ryan, Jeong Yong-Suk, Kim Jin Ho, Lee Jong-Weon, Park Dougho, Kim Hyoung-Seop
Department of AI and Big Data, Swiss School of Management, 6500 Bellinzona, Switzerland.
Department of Cardiology, Brain and Vascular Center, Pohang Stroke and Spine Hospital, Pohang 37659, Korea.
Diagnostics (Basel). 2021 May 25;11(6):943. doi: 10.3390/diagnostics11060943.
This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets.
We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20-I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared.
The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index.
Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models.
本研究基于国民健康保险服务健康筛查数据集,提出了一种使用机器学习(ML)算法的心血管疾病(CVD)预测模型。
我们提取了4699名45岁以上的患者作为CVD组,根据国际疾病分类系统(I20 - I25)进行诊断。此外,招募了4699名未诊断出CVD的随机受试者作为非CVD组。两组按年龄和性别进行匹配。应用各种ML算法进行CVD预测;然后,比较所有预测模型的性能。
在本研究验证的所有算法中,极端梯度提升、梯度提升和随机森林算法表现出最佳的平均预测准确性(受试者工作特征曲线下面积(AUROC):分别为0.812、0.812和0.811)。基于AUROC,与先前提出的预测模型相比,ML算法提高了CVD预测性能。既往CVD病史是预测模型准确性的最重要因素,其次是总胆固醇、低密度脂蛋白胆固醇、腰高比和体重指数。
我们的结果表明,所提出的基于健康筛查数据集的使用ML算法的CVD预测模型易于应用,产生经过验证的结果,并且优于先前的CVD预测模型。