Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China.
School of Computer Science and Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China.
BMC Genomics. 2023 Jul 27;24(1):424. doi: 10.1186/s12864-023-09501-3.
Non-coding RNAs (ncRNAs) draw much attention from studies widely in recent years because they play vital roles in life activities. As a good complement to wet experiment methods, computational prediction methods can greatly save experimental costs. However, high false-negative data and insufficient use of multi-source information can affect the performance of computational prediction methods. Furthermore, many computational methods do not have good robustness and generalization on different datasets. In this work, we propose an effective end-to-end computing framework, called GDCL-NcDA, of deep graph learning and deep matrix factorization (DMF) with contrastive learning, which identifies the latent ncRNA-disease association on diverse multi-source heterogeneous networks (MHNs). The diverse MHNs include different similarity networks and proven associations among ncRNAs (miRNAs, circRNAs, and lncRNAs), genes, and diseases. Firstly, GDCL-NcDA employs deep graph convolutional network and multiple attention mechanisms to adaptively integrate multi-source of MHNs and reconstruct the ncRNA-disease association graph. Then, GDCL-NcDA utilizes DMF to predict the latent disease-associated ncRNAs based on the reconstructed graphs to reduce the impact of the false-negatives from the original associations. Finally, GDCL-NcDA uses contrastive learning (CL) to generate a contrastive loss on the reconstructed graphs and the predicted graphs to improve the generalization and robustness of our GDCL-NcDA framework. The experimental results show that GDCL-NcDA outperforms highly related computational methods. Moreover, case studies demonstrate the effectiveness of GDCL-NcDA in identifying the associations among diversiform ncRNAs and diseases.
非编码 RNA(ncRNAs)近年来受到广泛关注,因为它们在生命活动中发挥着重要作用。作为对湿实验方法的良好补充,计算预测方法可以大大节省实验成本。然而,高假阴性数据和对多源信息的利用不足会影响计算预测方法的性能。此外,许多计算方法在不同数据集上没有很好的鲁棒性和泛化能力。在这项工作中,我们提出了一种有效的端到端计算框架,称为深度图学习和深度矩阵分解(DMF)的 GDCL-NcDA,它带有对比学习,可以识别不同多源异质网络(MHNs)上的潜在 ncRNA-疾病关联。多样化的 MHNs 包括不同的相似性网络以及 ncRNA(miRNA、circRNA 和 lncRNA)、基因和疾病之间的已证实关联。首先,GDCL-NcDA 采用深度图卷积网络和多种注意力机制,自适应地整合 MHNs 的多源信息,并重构 ncRNA-疾病关联图。然后,GDCL-NcDA 利用 DMF 根据重构图预测潜在的疾病相关 ncRNA,以减少原始关联中的假阴性的影响。最后,GDCL-NcDA 使用对比学习(CL)在重构图和预测图上生成对比损失,以提高我们的 GDCL-NcDA 框架的泛化和鲁棒性。实验结果表明,GDCL-NcDA 优于高度相关的计算方法。此外,案例研究证明了 GDCL-NcDA 在识别多样化 ncRNA 和疾病之间的关联方面的有效性。