Department of Ophthalmology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, East Qingchun Road, 310016 Zhejiang, China.
College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae443.
Unraveling the intricate network of associations among microRNAs (miRNAs), genes, and diseases is pivotal for deciphering molecular mechanisms, refining disease diagnosis, and crafting targeted therapies. Computational strategies, leveraging link prediction within biological graphs, present a cost-efficient alternative to high-cost empirical assays. However, while plenty of methods excel at predicting specific associations, such as miRNA-disease associations (MDAs), miRNA-target interactions (MTIs), and disease-gene associations (DGAs), a holistic approach harnessing diverse data sources for multifaceted association prediction remains largely unexplored. The limited availability of high-quality data, as vitro experiments to comprehensively confirm associations are often expensive and time-consuming, results in a sparse and noisy heterogeneous graph, hindering an accurate prediction of these complex associations. To address this challenge, we propose a novel framework called Global-local aware Heterogeneous Graph Contrastive Learning (GlaHGCL). GlaHGCL combines global and local contrastive learning to improve node embeddings in the heterogeneous graph. In particular, global contrastive learning enhances the robustness of node embeddings against noise by aligning global representations of the original graph and its augmented counterpart. Local contrastive learning enforces representation consistency between functionally similar or connected nodes across diverse data sources, effectively leveraging data heterogeneity and mitigating the issue of data scarcity. The refined node representations are applied to downstream tasks, such as MDA, MTI, and DGA prediction. Experiments show GlaHGCL outperforming state-of-the-art methods, and case studies further demonstrate its ability to accurately uncover new associations among miRNAs, genes, and diseases. We have made the datasets and source code publicly available at https://github.com/Sue-syx/GlaHGCL.
揭示 microRNAs(miRNAs)、基因和疾病之间错综复杂的关联网络对于破译分子机制、完善疾病诊断和制定靶向治疗策略至关重要。利用生物图谱中的链接预测的计算策略是一种比高成本的经验性检测更具成本效益的替代方法。然而,虽然有许多方法擅长预测特定的关联,如 miRNA-疾病关联(MDAs)、miRNA-靶相互作用(MTIs)和疾病-基因关联(DGAs),但利用多样化的数据来源进行多方面关联预测的整体方法在很大程度上仍未得到探索。由于全面验证关联的体外实验通常昂贵且耗时,高质量数据的有限可用性导致了稀疏且嘈杂的异质图,从而阻碍了这些复杂关联的准确预测。为了解决这个挑战,我们提出了一种名为全局-局部感知异质图对比学习(GlaHGCL)的新框架。GlaHGCL 结合了全局和局部对比学习,以改进异质图中的节点嵌入。特别是,全局对比学习通过对齐原始图及其增强版本的全局表示来增强节点嵌入对噪声的鲁棒性。局部对比学习在不同数据源之间强制功能相似或连接的节点的表示一致性,有效地利用了数据异质性并减轻了数据稀缺性的问题。改进后的节点表示应用于下游任务,如 MDA、MTI 和 DGA 预测。实验表明,GlaHGCL 优于最先进的方法,案例研究进一步证明了它准确揭示 miRNA、基因和疾病之间新关联的能力。我们已在 https://github.com/Sue-syx/GlaHGCL 上公开了数据集和源代码。