School of Science, Xi'an University of Architecture and Technology, Xi'an, Shaanxi, 710055, China.
School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China.
Lab Invest. 2022 Oct;102(10):1064-1074. doi: 10.1038/s41374-022-00801-y. Epub 2022 Jul 9.
Great advances in deep learning have provided effective solutions for prediction tasks in the biomedical field. However, accurate prognosis prediction using cancer genomics data remains challenging due to the severe overfitting problem caused by curse of dimensionality inherent to high-throughput sequencing data. Moreover, there are unique challenges to perform survival analysis, arising from the difficulty in utilizing censored samples whose events of interest are not observed. Convolutional neural network (CNN) models provide us the opportunity to extract meaningful hierarchical features to characterize cancer subtype and prognosis outcomes. On the other hand, feature selection can mitigate overfitting and reduce subsequent model training computation burden by screening out significant genes from redundant genes. To accomplish model simplification, we developed a concise and efficient survival analysis model, named CNN-Cox model, which combines a special CNN framework with prognosis-related feature selection cascaded Wx, with the advantage of less computation demand utilizing light training parameters. Experiment results show that CNN-Cox model achieved consistent higher C-index values and better survival prediction performance across seven cancer type datasets in The Cancer Genome Atlas cohort, including bladder carcinoma, head and neck squamous cell carcinoma, kidney renal cell carcinoma, brain low-grade glioma, lung adenocarcinoma (LUAD), lung squamous cell carcinoma, and skin cutaneous melanoma, compared with the existing state-of-the-art survival analysis methods. As an illustration of model interpretation, we examined potential prognostic gene signatures of LUAD dataset using the proposed CNN-Cox model. We conducted protein-protein interaction network analysis to identify potential prognostic genes and further analyzed the biological function of 13 hub genes, including ANLN, RACGAP1, KIF4A, KIF20A, KIF14, ASPM, CDK1, SPC25, NCAPG, MKI67, HJURP, EXO1, HMMR, whose high expression is significantly associated with poor survival of LUAD patients. These findings confirmed that CNN-Cox model is effective in extracting not only prognosis factors but also biologically meaningful gene features. The codes are available at the GitHub website: https://github.com/wangwangCCChen/CNN-Cox .
深度学习的巨大进步为生物医学领域的预测任务提供了有效的解决方案。然而,由于高通量测序数据固有的维度诅咒导致严重的过拟合问题,使用癌症基因组数据进行准确的预后预测仍然具有挑战性。此外,由于难以利用未观察到感兴趣事件的删失样本,生存分析也存在独特的挑战。卷积神经网络 (CNN) 模型为我们提供了提取有意义的层次特征的机会,以表征癌症亚型和预后结果。另一方面,特征选择可以通过从冗余基因中筛选出重要基因来减轻过拟合并减少后续模型训练的计算负担。为了实现模型简化,我们开发了一种简洁高效的生存分析模型,称为 CNN-Cox 模型,该模型结合了特殊的 CNN 框架和与预后相关的特征选择级联 Wx,具有利用轻量级训练参数减少计算需求的优势。实验结果表明,与现有的生存分析方法相比,CNN-Cox 模型在 The Cancer Genome Atlas 队列的七个癌症类型数据集(包括膀胱癌、头颈部鳞状细胞癌、肾肾细胞癌、脑低级别神经胶质瘤、肺腺癌 (LUAD)、肺鳞状细胞癌和皮肤黑色素瘤)中实现了一致更高的 C 指数值和更好的生存预测性能。作为模型解释的说明,我们使用提出的 CNN-Cox 模型检查了 LUAD 数据集的潜在预后基因特征。我们进行了蛋白质-蛋白质相互作用网络分析,以确定潜在的预后基因,并进一步分析了 13 个关键基因的生物学功能,包括 ANLN、RACGAP1、KIF4A、KIF20A、KIF14、ASPM、CDK1、SPC25、NCAPG、MKI67、HJURP、EXO1、HMMR,其高表达与 LUAD 患者的不良预后显著相关。这些发现证实,CNN-Cox 模型不仅有效地提取了预后因素,而且提取了具有生物学意义的基因特征。代码可在 GitHub 网站上获得:https://github.com/wangwangCCChen/CNN-Cox。