Tanvir Raihanul Bari, Islam Md Mezbahul, Sobhan Masrur, Luo Dongsheng, Mondal Ananda Mohan
Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.
Int J Mol Sci. 2024 Feb 28;25(5):2788. doi: 10.3390/ijms25052788.
Accurate cancer subtype prediction is crucial for personalized medicine. Integrating multi-omics data represents a viable approach to comprehending the intricate pathophysiology of complex diseases like cancer. Conventional machine learning techniques are not ideal for analyzing the complex interrelationships among different categories of omics data. Numerous models have been suggested using graph-based learning to uncover veiled representations and network formations unique to distinct types of omics data to heighten predictions regarding cancers and characterize patients' profiles, amongst other applications aimed at improving disease management in medical research. The existing graph-based state-of-the-art multi-omics integration approaches for cancer subtype prediction, MOGONET, and SUPREME, use a graph convolutional network (GCN), which fails to consider the level of importance of neighboring nodes on a particular node. To address this gap, we hypothesize that paying attention to each neighbor or providing appropriate weights to neighbors based on their importance might improve the cancer subtype prediction. The natural choice to determine the importance of each neighbor of a node in a graph is to explore the graph attention network (GAT). Here, we propose MOGAT, a novel multi-omics integration approach, leveraging GAT models that incorporate graph-based learning with an attention mechanism. MOGAT utilizes a multi-head attention mechanism to extract appropriate information for a specific sample by assigning unique attention coefficients to neighboring samples. Based on our knowledge, our group is the first to explore GAT in multi-omics integration for cancer subtype prediction. To evaluate the performance of MOGAT in predicting cancer subtypes, we explored two sets of breast cancer data from TCGA and METABRIC. Our proposed approach, MOGAT, outperforms MOGONET by 32% to 46% and SUPREME by 2% to 16% in cancer subtype prediction in different scenarios, supporting our hypothesis. Our results also showed that GAT embeddings provide a better prognosis in differentiating the high-risk group from the low-risk group than raw features.
准确的癌症亚型预测对于个性化医疗至关重要。整合多组学数据是理解像癌症这样复杂疾病的复杂病理生理学的一种可行方法。传统的机器学习技术并不适合分析不同类别组学数据之间的复杂相互关系。已经提出了许多基于图学习的模型,以揭示不同类型组学数据特有的隐藏表示和网络结构,从而提高对癌症的预测并描绘患者特征,以及用于医学研究中旨在改善疾病管理的其他应用。现有的用于癌症亚型预测的基于图的最先进多组学整合方法MOGONET和SUPREME使用图卷积网络(GCN),但该网络未能考虑特定节点上相邻节点的重要性水平。为了弥补这一差距,我们假设关注每个邻居或根据其重要性为邻居提供适当的权重可能会改善癌症亚型预测。在图中确定节点每个邻居重要性的自然选择是探索图注意力网络(GAT)。在此,我们提出了MOGAT,一种新颖的多组学整合方法,利用将基于图的学习与注意力机制相结合的GAT模型。MOGAT利用多头注意力机制,通过为相邻样本分配独特的注意力系数来为特定样本提取适当的信息。据我们所知,我们的团队是第一个在用于癌症亚型预测的多组学整合中探索GAT的。为了评估MOGAT在预测癌症亚型方面的性能,我们探索了来自TCGA和METABRIC的两组乳腺癌数据。在不同场景下的癌症亚型预测中,我们提出的方法MOGAT比MOGONET性能提升32%至46%,比SUPREME性能提升2%至16%,支持了我们的假设。我们的结果还表明,与原始特征相比,GAT嵌入在区分高风险组和低风险组方面提供了更好的预后。