Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
Greehey Children's Cancer Research Institute, The University of Texas Health San Antonio, San Antonio, Texas 78229, USA.
Methods. 2021 Aug;192:120-130. doi: 10.1016/j.ymeth.2021.01.004. Epub 2021 Jan 21.
The survival rate of cancer has increased significantly during the past two decades for breast, prostate, testicular, and colon cancer, while the brain and pancreatic cancers have a much lower median survival rate that has not improved much over the last forty years. This has imposed the challenge of finding gene markers for early cancer detection and treatment strategies. Different methods including regression-based Cox-PH, artificial neural networks, and recently deep learning algorithms have been proposed to predict the survival rate for cancers. We established in this work a novel graph convolution neural network (GCNN) approach called Surv_GCNN to predict the survival rate for 13 different cancer types using the TCGA dataset. For each cancer type, 6 Surv_GCNN models with graphs generated by correlation analysis, GeneMania database, and correlation + GeneMania were trained with and without clinical data to predict the risk score (RS). The performance of the 6 Surv_GCNN models was compared with two other existing models, Cox-PH and Cox-nnet. The results showed that Cox-PH has the worst performance among 8 tested models across the 13 cancer types while Surv_GCNN models with clinical data reported the best overall performance, outperforming other competing models in 7 out of 13 cancer types including BLCA, BRCA, COAD, LUSC, SARC, STAD, and UCEC. A novel network-based interpretation of Surv_GCNN was also proposed to identify potential gene markers for breast cancer. The signatures learned by the nodes in the hidden layer of Surv_GCNN were identified and were linked to potential gene markers by network modularization. The identified gene markers for breast cancer have been compared to a total of 213 gene markers from three widely cited lists for breast cancer survival analysis. About 57% of gene markers obtained by Surv_GCNN with correlation + GeneMania graph either overlap or directly interact with the 213 genes, confirming the effectiveness of the identified markers by Surv_GCNN.
在过去的二十年中,乳腺癌、前列腺癌、睾丸癌和结肠癌的存活率显著提高,而脑癌和胰腺癌的中位存活率较低,在过去的四十年中没有太大改善。这就提出了寻找基因标志物以进行早期癌症检测和治疗策略的挑战。已经提出了包括基于回归的 Cox-PH、人工神经网络和最近的深度学习算法在内的不同方法来预测癌症的存活率。我们在这项工作中建立了一种新的图卷积神经网络(GCNN)方法,称为 Surv_GCNN,使用 TCGA 数据集预测 13 种不同癌症类型的存活率。对于每种癌症类型,我们使用相关性分析、GeneMania 数据库和相关性+GeneMania 生成的图训练了 6 个 Surv_GCNN 模型,这些模型有无临床数据用于预测风险评分(RS)。将 6 个 Surv_GCNN 模型的性能与另外两个现有的模型 Cox-PH 和 Cox-nnet 进行了比较。结果表明,在 13 种癌症类型中,Cox-PH 在 8 个测试模型中的性能最差,而具有临床数据的 surv_GCNN 模型报告了最佳的整体性能,在 7 种癌症类型中优于其他竞争模型,包括 BLCA、BRCA、COAD、LUSC、SARC、STAD 和 UCEC。还提出了一种基于网络的 surv_GCNN 新解释,以识别乳腺癌的潜在基因标志物。识别 surv_GCNN 隐藏层中节点的特征,并通过网络模块化将其与潜在的基因标志物联系起来。乳腺癌的鉴定基因标志物与三个广泛引用的乳腺癌生存分析列表中的总共 213 个基因标志物进行了比较。通过 surv_GCNN 与相关性+GeneMania 图获得的基因标志物中约有 57%与这 213 个基因重叠或直接相互作用,证实了 surv_GCNN 鉴定的标记物的有效性。