Elbashir Murtada K, Almotilag Abdullah, Mahmood Mahmood A, Mohammed Mohanad
Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72441, Saudi Arabia.
School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa.
Diagnostics (Basel). 2024 Sep 29;14(19):2178. doi: 10.3390/diagnostics14192178.
: Cancer survival prediction is vital in improving patients' prospects and recommending therapies. Understanding the molecular behavior of cancer can be enhanced through the integration of multi-omics data, including mRNA, miRNA, and DNA methylation data. In light of these multi-omics data, we proposed a graph attention network (GAT) model in this study to predict the survival of non-small cell lung cancer (NSCLC). : The different omics data were obtained from The Cancer Genome Atlas (TCGA) and preprocessed and combined into a single dataset using the sample ID. We used the chi-square test to select the most significant features to be used in our model. We used the synthetic minority oversampling technique (SMOTE) to balance the dataset and the concordance index (C-index) to measure the performance of our model on different combinations of omics data. : Our model demonstrated superior performance, with the highest value of the C-index obtained when we used both mRNA and miRNA data. This demonstrates that the multi-omics approach could be effective in predicting survival. Further pathway analysis conducted with KEGG showed that our GAT model provided high weights to the features that are associated with the viral entry pathways, such as the Epstein-Barr virus and Influenza A pathways, which are involved in lung cancer development. From our findings, it can be observed that the proposed GAT model leads to a significantly improved prediction of survival by exploiting the strengths of multiple omics datasets and the findings from the enriched pathways. Our GAT model outperforms other state-of-the-art methods that are used for NSCLC prediction. : In this study, we developed a new model for the survival prediction of NSCLC using the GAT based on multi-omics data. Our model showed outstanding predictive values, and the KEGG analysis of the selected significant features showed that they were implicated in pivotal biological processes underlying pathways such as Influenza A and the Epstein-Barr virus infection, which are linked to lung cancer progression.
癌症生存预测对于改善患者预后和推荐治疗方案至关重要。通过整合多组学数据,包括mRNA、miRNA和DNA甲基化数据,可以增强对癌症分子行为的理解。鉴于这些多组学数据,我们在本研究中提出了一种图注意力网络(GAT)模型来预测非小细胞肺癌(NSCLC)的生存情况。
不同的组学数据来自癌症基因组图谱(TCGA),经过预处理并使用样本ID合并为一个数据集。我们使用卡方检验来选择模型中使用的最显著特征。我们使用合成少数过采样技术(SMOTE)来平衡数据集,并使用一致性指数(C-index)来衡量我们的模型在不同组学数据组合上的性能。
我们的模型表现出卓越的性能,当同时使用mRNA和miRNA数据时获得了最高的C-index值。这表明多组学方法在预测生存方面可能是有效的。使用KEGG进行的进一步通路分析表明,我们的GAT模型为与病毒进入通路相关的特征赋予了高权重,如与肺癌发展相关的爱泼斯坦-巴尔病毒和甲型流感通路。从我们的研究结果可以看出,所提出的GAT模型通过利用多个组学数据集的优势和富集通路的研究结果,显著改善了生存预测。我们的GAT模型优于用于NSCLC预测的其他现有先进方法。
在本研究中,我们基于多组学数据开发了一种使用GAT的NSCLC生存预测新模型。我们的模型显示出出色的预测价值,对所选显著特征的KEGG分析表明,它们与甲型流感和爱泼斯坦-巴尔病毒感染等通路背后的关键生物学过程有关,这些过程与肺癌进展相关。