Dou Yifan, Mirzaei Golrokh
Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210, United States.
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf405.
Cancer subtypes play a critical role in disease progression, prognosis, and treatment, making their detection essential for tailoring precision medicine. Studies have shown that multi-omics integration outperforms single-omics approaches in cancer subtyping tasks. However, due to the high-dimensionality of multi-omics data, many existing studies either fail to capture the correlation between true labels and learned features, or lack sufficient capacity to model complex biological representations. These limitations hinder the full potential of leveraging the rich and complementary information embedded in multi-omics datasets.
We propose a framework that leverages supervised feature learning and classification based on a graph-based learning approach with attention mechanism for cancer subtyping. More specifically, we train graph convolutional network models on each omics dataset to extract latent representations, which are then concatenated to form a comprehensive multi-omics feature embedding. We further develop sample fusion network based on the omics-specific graphs, incorporating the derived features and feeding them into a graph attention model for subtype classification. This two-stage multi-omics framework is applied to eight cancer types, with performance evaluated in terms of test accuracy, training time, macro-averaged precision, recall, and F-score. Experimental results show that the proposed method outperforms state-of-the-art approaches across various cancer types. Additionally, we provide empirical evidence supporting the hypothesis that retaining a limited number of high-confidence edges and utilizing enriched embeddings from intermediate graph neural network layers can improve predictive performance.
Data and the code are available at https://github.com/YD-00/MO-GCAN-Updated.git.
癌症亚型在疾病进展、预后和治疗中起着关键作用,因此对其进行检测对于精准医疗至关重要。研究表明,在癌症亚型分类任务中,多组学整合优于单组学方法。然而,由于多组学数据的高维度性,许多现有研究要么未能捕捉到真实标签与所学特征之间的相关性,要么缺乏对复杂生物学表征进行建模的足够能力。这些限制阻碍了充分利用多组学数据集中丰富且互补信息的潜力。
我们提出了一个基于带有注意力机制的图学习方法的框架,用于癌症亚型分类,该框架利用监督特征学习和分类。具体而言,我们在每个组学数据集上训练图卷积网络模型以提取潜在表征,然后将这些表征连接起来形成一个综合的多组学特征嵌入。我们进一步基于特定组学的图开发样本融合网络,纳入派生特征并将其输入到图注意力模型中进行亚型分类。这个两阶段的多组学框架应用于八种癌症类型,并根据测试准确率、训练时间、宏平均精度、召回率和F值来评估性能。实验结果表明,所提出的方法在各种癌症类型上均优于现有方法。此外,我们提供了实证证据支持以下假设:保留有限数量的高置信度边并利用中间图神经网络层的丰富嵌入可以提高预测性能。