基于机器学习的动脉粥样硬化性心血管疾病生活方式风险因素分析:回顾性病例对照研究

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

作者信息

Kim Hye-Jin, Choi Heeji, Ahn Hyo-Jung, Shin Seung-Ho, Kim Chulho, Lee Sang-Hwa, Sohn Jong-Hee, Lee Jae-Jun

机构信息

Artificial Intelligence Research Center, College of Medicine, Hallym University, Chuhcneon-si, Republic of Korea.

Health Insurance Review and Assessment Research Institute, Health Insurance Review and Assessment Service, Wonju-si, Republic of Korea.

出版信息

JMIR Med Inform. 2025 Aug 7;13:e74415. doi: 10.2196/74415.

Abstract

BACKGROUND

The risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases.

OBJECTIVE

We aimed to assess the predictive accuracy of machine learning (ML) models incorporating lifestyle risk behaviors for ASCVD risk using the Korean nationwide database.

METHODS

Using data from the Korea National Health and Nutrition Examination Survey, 5 ML algorithms were used for the prediction of high ASCVD risk: logistic regression (LR), support vector machine, random forest, extreme gradient boosting, and light gradient boosting models. ASCVD risk was assessed using the pooled cohort equations, with a high-risk threshold of ≥7.5% over 10 years. Among the 8573 participants aged 40-79 years, propensity score matching (PSM) was used to adjust for demographic confounders. We divided the dataset into a training and a test dataset in an 8:2 ratio. We also used bootstrapping to train the ML model with the area under the receiver operating characteristics curve score. Shapley additive explanations were used to identify the models' important variables in assessing high ASCVD risks. In sensitivity analysis, we additionally performed binary LR analysis, in which the ML model's results were consistent with the conventional statistical model.

RESULTS

Of the 8573 participants, 41.7% (n=3578) had high ASCVD risk. Before PSM, age and sex differed significantly between groups. PSM (1:1) yielded 1976 patients with balanced demographics. After PSM, the high ASCVD risk group had higher alcohol or tobacco use, lower omega-3 intake, higher BMI, less physical activity, and spent less time sitting. In 5 ML models, the extreme gradient boosting model showed the highest area under the receiver operating characteristics curve, indicating superior overall discrimination between high and low ASCVD risk groups. However, the light gradient boosting model demonstrated better performance in accuracy, recall, and F1-score. Variable importance analysis using Shapley additive explanations identified smoking and age as the strongest predictors, while BMI, sodium or omega-3 intake, and low-density lipoprotein cholesterol also had significant variables. Sensitivity analysis using multivariable LR analysis also confirmed these findings, showing that smoking, BMI, and low-density lipoprotein cholesterol increased ASCVD risk, whereas omega-3 intake and physical activity were associated with lower risk.

CONCLUSIONS

Analyzing lifestyle behavioral factors in ASCVD risk with an ML model improves the predictive performance compared to traditional models. Personalized prevention strategies tailored to an individual's lifestyle can effectively reduce ASCVD risk.

摘要

背景

动脉粥样硬化性心血管疾病(ASCVD)的发病风险因人而异,除了慢性病外,还与多种生活方式因素有关。

目的

我们旨在使用韩国全国性数据库评估纳入生活方式风险行为的机器学习(ML)模型对ASCVD风险的预测准确性。

方法

利用韩国国民健康与营养检查调查的数据,使用5种ML算法预测高ASCVD风险:逻辑回归(LR)、支持向量机、随机森林、极端梯度提升和轻梯度提升模型。使用汇总队列方程评估ASCVD风险,10年高风险阈值为≥7.5%。在8573名40-79岁的参与者中,采用倾向得分匹配(PSM)来调整人口统计学混杂因素。我们将数据集按8:2的比例分为训练集和测试集。我们还使用自助法训练具有受试者操作特征曲线下面积得分的ML模型。使用夏普利值附加解释来识别模型在评估高ASCVD风险时的重要变量。在敏感性分析中,我们额外进行了二元LR分析,其中ML模型的结果与传统统计模型一致。

结果

在8573名参与者中,41.7%(n=3578)有高ASCVD风险。在PSM之前,两组之间的年龄和性别差异显著。PSM(1:1)产生了1976名人口统计学特征均衡的患者。PSM后,高ASCVD风险组的酒精或烟草使用量更高、ω-3摄入量更低、BMI更高、身体活动更少且久坐时间更少。在5种ML模型中,极端梯度提升模型的受试者操作特征曲线下面积最高,表明在区分高和低ASCVD风险组方面总体辨别能力更强。然而,轻梯度提升模型在准确性、召回率和F1分数方面表现更好。使用夏普利值附加解释的变量重要性分析确定吸烟和年龄是最强的预测因素,而BMI、钠或ω-3摄入量以及低密度脂蛋白胆固醇也是重要变量。使用多变量LR分析的敏感性分析也证实了这些发现,表明吸烟、BMI和低密度脂蛋白胆固醇会增加ASCVD风险,而ω-3摄入量和身体活动与较低风险相关。

结论

与传统模型相比,使用ML模型分析ASCVD风险中的生活方式行为因素可提高预测性能。根据个人生活方式量身定制的个性化预防策略可有效降低ASCVD风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1de/12330983/f96fefd97c86/medinform-v13-e74415-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索