Kim Sehyun, Ryu Beomsang, Choi Mingee, Lee Sangyon, Shin Jaeyong, Hong Sok Chul
Department of Economics, Seoul National University College of Social Science, Gwanak-gu, Seoul, 08826, Republic of Korea.
Wellxecon, Gangnam-gu, Seoul, Republic of Korea.
Sci Rep. 2025 Jul 2;15(1):22585. doi: 10.1038/s41598-025-94888-0.
As the importance of the prevention and premanagement of cardiovascular and cerebrovascular diseases continues to emerge, research is being conducted globally to create and compare risk factor prediction models using health examination big data. In this study, health insurance data were used to predict the incidence of cardiocerebrovascular disease using various models and compare the performance of the models on samples with different initial risk levels. This study analyzed data from 410,859 individuals from the National Health Insurance Service between 2002 and 2019. This study deployed various linear models to predict the occurrence of cardiocerebrovascular diseases in two distinct samples. Models based on logistic regression analysis with penalty terms on the objective function were used, and their predictive performances were compared using multiple evaluation metrics, including the area under the receiver operating characteristic curve. The logistic regression model incorporating variables selected by the LASSO algorithm exhibited superior predictive performance relative to other models, although the differences were not statistically significant. The models demonstrated improved performance for samples with higher incidence rates and initial risk levels. This study predicted and compared the incidence of cardiocerebrovascular disease (CCVD) in patients with different health conditions using national sample cohort data from the National Health Insurance Service. The findings underscore the importance of developing diverse models to predict diseases like CCVD, which have high medical costs and incidence rates, thus informing the development of healthcare policies.
随着心血管疾病预防和预管理的重要性不断凸显,全球都在开展研究,利用健康检查大数据创建并比较风险因素预测模型。在本研究中,使用健康保险数据,通过各种模型预测心血管疾病的发病率,并比较这些模型在不同初始风险水平样本上的表现。本研究分析了2002年至2019年期间来自国民健康保险服务中心的410,859个人的数据。本研究采用各种线性模型预测两个不同样本中心血管疾病的发生情况。使用了基于对目标函数施加惩罚项的逻辑回归分析模型,并使用包括受试者工作特征曲线下面积在内的多个评估指标比较它们的预测性能。尽管差异无统计学意义,但纳入由套索算法选择变量的逻辑回归模型相对于其他模型表现出更好的预测性能。这些模型在发病率和初始风险水平较高的样本中表现出更好的性能。本研究利用国民健康保险服务中心的全国样本队列数据,预测并比较了不同健康状况患者中心血管疾病(CCVD)的发病率。研究结果强调了开发多种模型来预测像CCVD这样医疗成本高且发病率高的疾病的重要性,从而为医疗政策的制定提供参考。
Clin Orthop Relat Res. 2024-9-1
Cochrane Database Syst Rev. 2021-4-19
Health Technol Assess. 2006-9
Cochrane Database Syst Rev. 2020-1-9
Cochrane Database Syst Rev. 2001
Cochrane Database Syst Rev. 2017-12-22
J Diabetes Metab Disord. 2022-1-12
BMC Med Inform Decis Mak. 2019-11-6
Lancet. 2020-3-7
J Comput Graph Stat. 2018
J Atheroscler Thromb. 2018-9-1
Value Health Reg Issues. 2013
Hellenic J Cardiol. 2017