Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
Comput Biol Med. 2024 Jun;176:108568. doi: 10.1016/j.compbiomed.2024.108568. Epub 2024 May 9.
Discovery of the cancer type specific-driver genes is important for understanding the molecular mechanisms of each cancer type and for providing proper treatment. Recently, graph deep learning methods became widely used in finding cancer-driver genes. However, previous methods had limited performance in individual cancer types due to a small number of cancer-driver genes used in training and biases toward the cancer-driver genes used in training the models. Here, we introduce a novel pipeline, CancerGATE that predicts the cancer-driver genes using graph attention autoencoder (GATE) to learn in a self-supervised manner and can be applied to each of the cancer types. CancerGATE utilizes biological network topology and multi-omics data from 15 types of cancer of 20,079 samples from the cancer genome atlas (TCGA). Attention coefficients calculated in the model are used to prioritize cancer-driver genes by comparing coefficients of cancer and normal contexts. CancerGATE shows a higher AUPRC with a difference ranging from 1.5 % to 36.5 % compared to the previous graph deep learning models in each cancer type. We also show that CancerGATE is free from the bias toward cancer-driver genes used in training, revealing mechanisms of the cancer-driver genes in specific cancer types. Finally, we propose novel cancer-driver gene candidates that could be therapeutic targets for specific cancer types.
发现癌症类型特异性驱动基因对于理解每种癌症的分子机制和提供适当的治疗方法非常重要。最近,图深度学习方法在寻找癌症驱动基因方面得到了广泛应用。然而,由于在训练中使用的癌症驱动基因数量有限,以及模型训练中使用的癌症驱动基因存在偏差,以前的方法在个别癌症类型中的性能有限。在这里,我们介绍了一种新的管道 CancerGATE,它使用图注意力自动编码器(GATE)进行自我监督学习来预测癌症驱动基因,并可应用于每种癌症类型。CancerGATE 利用来自癌症基因组图谱(TCGA)的 15 种癌症的 20,079 个样本的生物网络拓扑和多组学数据。通过比较癌症和正常背景的系数,模型中计算出的注意力系数可用于对癌症驱动基因进行优先级排序。与每个癌症类型中的以前的图深度学习模型相比,CancerGATE 的 AUPRC 更高,差异范围为 1.5%到 36.5%。我们还表明,CancerGATE 不受训练中使用的癌症驱动基因的偏差影响,揭示了特定癌症类型中癌症驱动基因的机制。最后,我们提出了一些新的癌症驱动基因候选物,它们可能成为特定癌症类型的治疗靶点。