Sobhan Masrur, Islam Md Mezbahul, Mondal Ananda Mohan
Knight Foundation School of Computing and Information Science, Florida International University, Miami, FL 33199, USA.
bioRxiv. 2025 Jan 27:2025.01.24.634827. doi: 10.1101/2025.01.24.634827.
Convolutional neural networks (CNNs) offer potential for analyzing non-grid structured data, such as biological array data, by converting it into image-like formats using principal component analysis (PCA) of pathway genes. However, PCA-derived principal components (PCs) from the entire dataset capture global variance but fail to extract sub-cohort (class-specific) variances. Consequently, CNNs trained on global PCs perform poorly in survival prediction of glioblastoma multiforme (GBM), and the corresponding explanation of CNN outcomes may not align with disease-relevant pathways.
We present PathX-CNN, an explainable CNN framework that addresses these limitations by integrating multi-omics data through pathway-based images derived from sub-cohort-specific PCs. PathX-CNN outperformed existing pathway-based methods in predicting long-term survival (LTS) versus non-LTS in GBM. By leveraging SHAP (SHapley Additive exPlanations), a cooperative game theory-based explainable AI method, PathX-CNN identified biologically plausible pathways associated with GBM survival. Additionally, experiments on other cancer types demonstrated superior performance compared to traditional approaches. PathX-CNN demonstrates the potential of CNNs for multi-omics integration, offering both improved prediction accuracy and pathway-specific insights into disease mechanisms.
卷积神经网络(CNN)通过使用通路基因的主成分分析(PCA)将非网格结构数据(如生物阵列数据)转换为类似图像的格式,为分析此类数据提供了潜力。然而,从整个数据集中通过PCA得出的主成分(PC)捕获的是全局方差,无法提取亚组(特定类别的)方差。因此,基于全局PC训练的CNN在多形性胶质母细胞瘤(GBM)的生存预测中表现不佳,并且对CNN结果的相应解释可能与疾病相关通路不一致。
我们提出了PathX-CNN,这是一个可解释的CNN框架,通过基于亚组特异性PC生成的通路图像整合多组学数据来解决这些限制。在预测GBM的长期生存(LTS)与非LTS方面,PathX-CNN优于现有的基于通路的方法。通过利用SHAP(SHapley Additive exPlanations),一种基于合作博弈论的可解释人工智能方法,PathX-CNN确定了与GBM生存相关的生物学上合理的通路。此外,在其他癌症类型上的实验表明,与传统方法相比,其性能更优。PathX-CNN展示了CNN在多组学整合方面的潜力,既提高了预测准确性,又提供了针对疾病机制的通路特异性见解。