Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea.
School of Management, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Korea.
Int J Environ Res Public Health. 2022 Oct 21;19(20):13672. doi: 10.3390/ijerph192013672.
In this study, socioeconomic, medical treatment, and health check-up data from 2010 to 2017 of the National Health Insurance Service (NHIS) of Korea were analyzed. This year's socioeconomic, treatment, and health check-up data are used to develop a predictive model for high medical expenses in the next year. The characteristic of this study is to derive important variables related to the high cost of domestic medical expenses users by using data on health check-up items conducted by the country. In this study, we tried to classify data and evaluate its performance using classification supervised learning algorithms for high-cost medical expense prediction. Supervised learning for predicting high-cost medical expenses was performed using the logistic regression model, random forest, and XGBoost, which have been known to result the best performance and explanatory power among the machine learning algorithms used in previous studies. Our experimental results show that the XGBoost model had the best performance with 77.1% accuracy. The contribution of this study is to identify the variables that affect the prediction of high-cost medical expenses by analyzing the medical bills using the health check-up variables and the Korea Classification Disease (KCD) large group as input variables. Through this study, it was confirmed that musculoskeletal disorders (M) and respiratory diseases (J), which are the most frequently treated diseases, as important KCD disease groups for high-cost prediction in Korea, affect the future high cost prediction. In addition, it was confirmed that malignant neoplasia diseases (C) with high medical cost per treatment are a group of diseases related to high future medical cost prediction. Unlike previous studies, it is the result of analyzing all disease data, so it is expected that the study will be more meaningful when compared with the results of other national health check-up data.
本研究分析了韩国国民健康保险服务(NHIS)2010 年至 2017 年的社会经济、医疗和健康检查数据。今年的社会经济、治疗和健康检查数据用于开发预测明年高额医疗费用的预测模型。本研究的特点是利用国家进行的健康检查项目的数据,得出与国内医疗费用使用者高成本相关的重要变量。在本研究中,我们尝试使用分类监督学习算法对高成本医疗费用预测进行数据分类和性能评估。使用逻辑回归模型、随机森林和 XGBoost 对预测高成本医疗费用进行监督学习,这些算法在之前的研究中已被证明具有最佳的性能和解释力。我们的实验结果表明,XGBoost 模型的准确率最高,为 77.1%。本研究的贡献在于通过分析使用健康检查变量和韩国分类疾病(KCD)大组作为输入变量的医疗账单,确定影响高成本医疗费用预测的变量。通过这项研究,我们确认了在韩国,肌肉骨骼疾病(M)和呼吸道疾病(J)是最常治疗的疾病,作为 KCD 疾病中预测高成本的重要疾病组,影响未来的高成本预测。此外,还证实了每治疗一次医疗费用较高的恶性肿瘤疾病(C)是与未来高医疗费用预测相关的疾病组。与之前的研究不同,它是对所有疾病数据进行分析的结果,因此与其他国家健康检查数据的结果相比,该研究更有意义。