Alharbi Fadi, Vakanski Aleksandar, Zhang Boyu, Elbashir Murtada K, Mohammed Mohanad
College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA.
College of Computer and Information Sciences, Department of Information Systems, Jouf University, Sakaka, Al-Jouf 72441, Saudi Arabia.
IEEE Access. 2025;13:37724-37736. doi: 10.1109/access.2025.3540769. Epub 2025 Feb 11.
Recent studies on integrating multiple omics data highlighted the potential to advance our understanding of the cancer disease process. Computational models based on graph neural networks and attention-based architectures have demonstrated promising results for cancer classification due to their ability to model complex relationships among biological entities. However, challenges related to addressing the high dimensionality and complexity in integrating multi-omics data, as well as in constructing graph structures that effectively capture the interactions between nodes, remain active areas of research. This study evaluates graph neural network architectures for multi-omics (MO) data integration based on graph-convolutional networks (GCN), graph-attention networks (GAT), and graph-transformer networks (GTN). Differential gene expression and LASSO (Least Absolute Shrinkage and Selection Operator) regression are employed for reducing the omics data dimensionality and feature selection; hence, the developed models are referred to as LASSO-MOGCN, LASSO-MOGAT, and LASSO-MOGTN. Graph structures constructed using sample correlation matrices and protein-protein interaction networks are investigated. Experimental validation is performed with a dataset of 8,464 samples from 31 cancer types and normal tissue, comprising messenger-RNA, micro-RNA, and DNA methylation data. The results show that the models integrating multi-omics data outperformed the models trained on single omics data, where LASSO-MOGAT achieved the best overall performance, with an accuracy of 95.9%. The findings also suggest that correlation-based graph structures enhance the models' ability to identify shared cancer-specific signatures across patients in comparison to protein-protein interaction networks-based graph structures. The code and data used in this study are available in the link (https://github.com/FadiAlharbi2024/Graph_Based_Architecture.git).
最近关于整合多组学数据的研究凸显了推进我们对癌症疾病进程理解的潜力。基于图神经网络和注意力架构的计算模型,因其能够对生物实体之间的复杂关系进行建模,在癌症分类方面已展现出有前景的结果。然而,在整合多组学数据时应对高维度和复杂性,以及构建能有效捕捉节点间相互作用的图结构方面的挑战,仍是活跃的研究领域。本研究评估了基于图卷积网络(GCN)、图注意力网络(GAT)和图变换器网络(GTN)的用于多组学(MO)数据整合的图神经网络架构。采用差异基因表达和LASSO(最小绝对收缩和选择算子)回归来降低组学数据维度并进行特征选择;因此,所开发的模型被称为LASSO - MOGCN、LASSO - MOGAT和LASSO - MOGTN。研究了使用样本相关矩阵和蛋白质 - 蛋白质相互作用网络构建的图结构。使用来自31种癌症类型和正常组织的8464个样本的数据集进行了实验验证,该数据集包含信使RNA、微小RNA和DNA甲基化数据。结果表明,整合多组学数据的模型优于基于单一组学数据训练的模型,其中LASSO - MOGAT取得了最佳总体性能,准确率为95.9%。研究结果还表明,与基于蛋白质 - 蛋白质相互作用网络的图结构相比,基于相关性的图结构增强了模型识别患者间共享的癌症特异性特征的能力。本研究中使用的代码和数据可在链接(https://github.com/FadiAlharbi2024/Graph_Based_Architecture.git)中获取。