Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America.
Biostatistics Shared Resource, University of Colorado Cancer Center, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America.
PLoS One. 2023 Apr 21;18(4):e0284563. doi: 10.1371/journal.pone.0284563. eCollection 2023.
Network approaches have successfully been used to help reveal complex mechanisms of diseases including Chronic Obstructive Pulmonary Disease (COPD). However despite recent advances, we remain limited in our ability to incorporate protein-protein interaction (PPI) network information with omics data for disease prediction. New deep learning methods including convolution Graph Neural Network (ConvGNN) has shown great potential for disease classification using transcriptomics data and known PPI networks from existing databases. In this study, we first reconstructed the COPD-associated PPI network through the AhGlasso (Augmented High-Dimensional Graphical Lasso Method) algorithm based on one independent transcriptomics dataset including COPD cases and controls. Then we extended the existing ConvGNN methods to successfully integrate COPD-associated PPI, proteomics, and transcriptomics data and developed a prediction model for COPD classification. This approach improves accuracy over several conventional classification methods and neural networks that do not incorporate network information. We also demonstrated that the updated COPD-associated network developed using AhGlasso further improves prediction accuracy. Although deep neural networks often achieve superior statistical power in classification compared to other methods, it can be very difficult to explain how the model, especially graph neural network(s), makes decisions on the given features and identifies the features that contribute the most to prediction generally and individually. To better explain how the spectral-based Graph Neural Network model(s) works, we applied one unified explainable machine learning method, SHapley Additive exPlanations (SHAP), and identified CXCL11, IL-2, CD48, KIR3DL2, TLR2, BMP10 and several other relevant COPD genes in subnetworks of the ConvGNN model for COPD prediction. Finally, Gene Ontology (GO) enrichment analysis identified glycosaminoglycan, heparin signaling, and carbohydrate derivative signaling pathways significantly enriched in the top important gene/proteins for COPD classifications.
网络方法已成功用于帮助揭示包括慢性阻塞性肺疾病(COPD)在内的复杂疾病机制。然而,尽管最近取得了进展,但我们在将蛋白质-蛋白质相互作用(PPI)网络信息与用于疾病预测的组学数据相结合的能力方面仍然受到限制。新的深度学习方法,包括卷积图神经网络(ConvGNN),已显示出使用转录组学数据和来自现有数据库的已知 PPI 网络对疾病进行分类的巨大潜力。在这项研究中,我们首先通过基于一个独立转录组学数据集(包括 COPD 病例和对照)的 AhGlasso(增强高维图形套索方法)算法重建 COPD 相关的 PPI 网络。然后,我们扩展了现有的 ConvGNN 方法,成功地整合了 COPD 相关的 PPI、蛋白质组学和转录组学数据,并开发了用于 COPD 分类的预测模型。与不整合网络信息的几种传统分类方法和神经网络相比,该方法提高了准确性。我们还证明,使用 AhGlasso 开发的更新的 COPD 相关网络进一步提高了预测准确性。尽管深度神经网络在分类方面通常比其他方法具有更高的统计能力,但要解释模型(特别是图神经网络)如何根据给定特征做出决策以及识别对一般和个别预测贡献最大的特征非常困难。为了更好地解释基于谱的图神经网络模型的工作原理,我们应用了一种统一的可解释机器学习方法 SHapley Additive exPlanations (SHAP),并在 COPD 预测的 ConvGNN 模型的子网络中确定了 CXCL11、IL-2、CD48、KIR3DL2、TLR2、BMP10 和其他几个相关的 COPD 基因。最后,基因本体论(GO)富集分析确定了在 COPD 分类的重要基因/蛋白中显著富集的糖胺聚糖、肝素信号和碳水化合物衍生物信号通路。
IEEE J Biomed Health Inform. 2023-9
Comput Methods Programs Biomed. 2024-9
BMC Bioinformatics. 2024-1-15
Patterns (N Y). 2025-3-14
NPJ Syst Biol Appl. 2025-2-15
BMC Med Inform Decis Mak. 2025-2-13
Med Biol Eng Comput. 2022-8
Front Genet. 2021-10-28
Int J Chron Obstruct Pulmon Dis. 2020
Nature. 2020-9-16