Bae Chul-Young, Kim Bo-Seon, Jee Sun-Ha, Lee Jong-Hoon, Nguyen Ngoc-Dung
Mediage Research Center, Seongnam-si 13449, Republic of Korea.
Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul 03722, Republic of Korea.
Cancers (Basel). 2023 Sep 27;15(19):4757. doi: 10.3390/cancers15194757.
Cancer is one of the main global health threats. Early personalized prediction of cancer incidence is crucial for the population at risk. This study introduces a novel cancer prediction model based on modern recurrent survival deep learning algorithms. The study includes 160,407 participants from the blood-based cohort of the Korea Cancer Prevention Research-II Biobank, which has been ongoing since 2004. Data linkages were designed to ensure anonymity, and data collection was carried out through nationwide medical examinations. Predictive performance on ten cancer sites, evaluated using the concordance index (c-index), was compared among nDeep and its multitask variation, Cox proportional hazard (PH) regression, DeepSurv, and DeepHit. Our models consistently achieved a c-index of over 0.8 for all ten cancers, with a peak of 0.8922 for lung cancer. They outperformed Cox PH regression and other survival deep neural networks. This study presents a survival deep learning model that demonstrates the highest predictive performance on censored health dataset, to the best of our knowledge. In the future, we plan to investigate the causal relationship between explanatory variables and cancer to reduce cancer incidence and mortality.
癌症是全球主要的健康威胁之一。对癌症发病率进行早期个性化预测对于高危人群至关重要。本研究引入了一种基于现代递归生存深度学习算法的新型癌症预测模型。该研究纳入了来自韩国癌症预防研究-II生物样本库血液队列的160407名参与者,该队列自2004年以来一直在进行。数据链接旨在确保匿名性,数据收集通过全国范围的医学检查进行。使用一致性指数(c指数)评估了nDeep及其多任务变体、Cox比例风险(PH)回归、DeepSurv和DeepHit在十个癌症部位的预测性能。我们的模型在所有十种癌症中始终实现了超过0.8的c指数,肺癌的峰值为0.8922。它们优于Cox PH回归和其他生存深度神经网络。据我们所知,本研究提出了一种生存深度学习模型,该模型在删失健康数据集上表现出最高的预测性能。未来,我们计划研究解释变量与癌症之间的因果关系,以降低癌症发病率和死亡率。