Kang Seong Uk, Nam Seung-Joo, Kwon Oh Beom, Yim Inhyeok, Kim Tae-Hoon, Yeo Na Young, Lim Myoung Nam, Kim Woo Jin, Park Sang Won
Department of Bigdata, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea.
Department of Convergence Security, Kangwon National University, Chuncheon 24341, Republic of Korea.
Cancers (Basel). 2024 Dec 25;17(1):30. doi: 10.3390/cancers17010030.
Gastric cancer is a leading cause of cancer-related mortality, particularly in East Asia, with a notable burden in Republic of Korea. This study aimed to construct and develop machine learning models for the prediction of gastric cancer mortality and the identification of risk factors. All data were acquired from the Korean Clinical Data Utilization for Research Excellence by multiple medical centers in South Korea. A total of 23,717 gastric cancer patients were divided into two groups by cause of mortality (all-cause of 2664 and disease-specific of 1620) and investigated. We used comprehensive data integrating clinical, pathological, lifestyle, and socio-economic factors. Cox proportional hazards analysis was conducted to estimate hazard ratios for mortality. Five machine learning models (random forest, gradient boosting machine, XGBoost, light GBM, and cat boosting) were developed to predict mortality. The models were interpreted by SHAP, one of the explainable AI techniques. For all-cause mortality, the gradient-boosting machine learning model demonstrated the highest performance with an AUC-ROC of 0.795. For disease-specific mortality, the light GBM model outperformed others, achieving an AUC-ROC of 0.867. Significant predictors included the AJCC7 stage, tumor size, lymph node count, and lifestyle factors such as smoking, drinking, and diabetes. This study underscores the importance of integrating both clinical and lifestyle data to enhance mortality prediction accuracy in gastric cancer patients. The findings highlight the need for personalized treatment approaches in the Korean population and emphasize the role of demographic-specific data in predictive modeling.
胃癌是癌症相关死亡的主要原因,在东亚地区尤为突出,韩国的负担尤为显著。本研究旨在构建和开发机器学习模型,用于预测胃癌死亡率并识别风险因素。所有数据均来自韩国多个医疗中心的韩国卓越临床数据利用研究。总共23717名胃癌患者按死亡原因分为两组(全因死亡2664例,疾病特异性死亡1620例)并进行调查。我们使用了整合临床、病理、生活方式和社会经济因素的综合数据。进行Cox比例风险分析以估计死亡率的风险比。开发了五种机器学习模型(随机森林、梯度提升机、XGBoost、轻量级梯度提升机和类别提升)来预测死亡率。这些模型通过可解释人工智能技术之一的SHAP进行解释。对于全因死亡率,梯度提升机器学习模型表现最佳,AUC-ROC为0.795。对于疾病特异性死亡率,轻量级梯度提升机模型优于其他模型,AUC-ROC达到0.867。重要的预测因素包括美国癌症联合委员会第7版分期、肿瘤大小、淋巴结计数以及吸烟、饮酒和糖尿病等生活方式因素。本研究强调了整合临床和生活方式数据以提高胃癌患者死亡率预测准确性的重要性。研究结果凸显了韩国人群个性化治疗方法的必要性,并强调了特定人群数据在预测建模中的作用。