Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, South Korea.
Research Institute, National Health Insurance Service Ilsan Hospital, Goyang, South Korea.
BMC Cancer. 2021 Jun 29;21(1):755. doi: 10.1186/s12885-021-08498-w.
Almost all Koreans are covered by mandatory national health insurance and are required to undergo health screening at least once every 2 years. We aimed to develop a machine learning model to predict the risk of developing hepatocellular carcinoma (HCC) based on the screening results and insurance claim data.
The National Health Insurance Service-National Health Screening database was used for this study (NHIS-2020-2-146). Our study cohort consisted of 417,346 health screening examinees between 2004 and 2007 without cancer history, which was split into training and test cohorts by the examination date, before or after 2005. Robust predictors were selected using Cox proportional hazard regression with 1000 different bootstrapped datasets. Random forest and extreme gradient boosting algorithms were used to develop a prediction model for the 9-year risk of HCC development after screening. After optimizing a prediction model via cross validation in the training cohort, the model was validated in the test cohort.
Of the total examinees, 0.5% (1799/331,694) and 0.4% (390/85,652) in the training cohort and the test cohort were diagnosed with HCC, respectively. Of the selected predictors, older age, male sex, obesity, abnormal liver function tests, the family history of chronic liver disease, and underlying chronic liver disease, chronic hepatitis virus or human immunodeficiency virus infection, and diabetes mellitus were associated with increased risk, whereas higher income, elevated total cholesterol, and underlying dyslipidemia or schizophrenic/delusional disorders were associated with decreased risk of HCC development (p < 0.001). In the test, our model showed good discrimination and calibration. The C-index, AUC, and Brier skill score were 0.857, 0.873, and 0.078, respectively.
Machine learning-based model could be used to predict the risk of HCC development based on the health screening examination results and claim data.
几乎所有韩国人都参加了强制性的国家医疗保险,并被要求每两年至少进行一次健康检查。我们旨在开发一种机器学习模型,根据筛查结果和保险索赔数据预测肝细胞癌(HCC)的发病风险。
本研究使用了国家健康保险服务-国家健康筛查数据库(NHIS-2020-2-146)。我们的研究队列由 2004 年至 2007 年间无癌症史的 417346 名健康检查受检者组成,根据检查日期分为训练队列和测试队列,检查日期早于或晚于 2005 年。使用 Cox 比例风险回归和 1000 个不同的自举数据集选择稳健的预测因子。随机森林和极端梯度增强算法用于开发筛查后 9 年内 HCC 发展风险的预测模型。在训练队列中通过交叉验证优化预测模型后,在测试队列中对模型进行验证。
在总受检者中,训练队列和测试队列中分别有 0.5%(1799/331694)和 0.4%(390/85652)被诊断为 HCC。在选定的预测因素中,年龄较大、男性、肥胖、肝功能异常、慢性肝病家族史、慢性肝病、慢性肝炎病毒或人类免疫缺陷病毒感染以及糖尿病与 HCC 发病风险增加相关,而较高的收入、总胆固醇升高以及潜在的血脂异常或精神分裂症/妄想障碍与 HCC 发病风险降低相关(p<0.001)。在测试中,我们的模型显示出良好的区分度和校准度。C 指数、AUC 和 Brier 技能评分分别为 0.857、0.873 和 0.078。
基于健康检查结果和索赔数据,基于机器学习的模型可用于预测 HCC 发病风险。