Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
School of Artificial Intelligence, Jilin University, Changchun 130012, China.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab604.
Predicting disease-related long non-coding RNAs (lncRNAs) can be used as the biomarkers for disease diagnosis and treatment. The development of effective computational prediction approaches to predict lncRNA-disease associations (LDAs) can provide insights into the pathogenesis of complex human diseases and reduce experimental costs. However, few of the existing methods use microRNA (miRNA) information and consider the complex relationship between inter-graph and intra-graph in complex-graph for assisting prediction.
In this paper, the relationships between the same types of nodes and different types of nodes in complex-graph are introduced. We propose a multi-channel graph attention autoencoder model to predict LDAs, called MGATE. First, an lncRNA-miRNA-disease complex-graph is established based on the similarity and correlation among lncRNA, miRNA and diseases to integrate the complex association among them. Secondly, in order to fully extract the comprehensive information of the nodes, we use graph autoencoder networks to learn multiple representations from complex-graph, inter-graph and intra-graph. Thirdly, a graph-level attention mechanism integration module is adopted to adaptively merge the three representations, and a combined training strategy is performed to optimize the whole model to ensure the complementary and consistency among the multi-graph embedding representations. Finally, multiple classifiers are explored, and Random Forest is used to predict the association score between lncRNA and disease. Experimental results on the public dataset show that the area under receiver operating characteristic curve and area under precision-recall curve of MGATE are 0.964 and 0.413, respectively. MGATE performance significantly outperformed seven state-of-the-art methods. Furthermore, the case studies of three cancers further demonstrate the ability of MGATE to identify potential disease-correlated candidate lncRNAs. The source code and supplementary data are available at https://github.com/sheng-n/MGATE.
预测与疾病相关的长非编码 RNA(lncRNA)可以作为疾病诊断和治疗的生物标志物。开发有效的计算预测方法来预测 lncRNA-疾病关联(LDAs)可以深入了解复杂人类疾病的发病机制,并降低实验成本。然而,现有的方法很少利用 miRNA 信息,并且考虑到复杂图中内图和图间的复杂关系,以协助预测。
本文介绍了复杂图中同类型节点和不同类型节点之间的关系。我们提出了一种多通道图注意自动编码器模型来预测 LDAs,称为 MGATE。首先,基于 lncRNA、miRNA 和疾病之间的相似性和相关性,建立 lncRNA-miRNA-疾病复杂图,以整合它们之间的复杂关联。其次,为了充分提取节点的综合信息,我们使用图自动编码器网络从复杂图、图间和图内学习多个表示。第三,采用图级注意机制集成模块自适应融合三种表示,并采用联合训练策略优化整个模型,以确保多图嵌入表示之间的互补性和一致性。最后,探索了多种分类器,并使用随机森林预测 lncRNA 和疾病之间的关联分数。在公共数据集上的实验结果表明,MGATE 的接收器工作特征曲线下面积和精度-召回曲线下面积分别为 0.964 和 0.413,均显著优于七种最先进的方法。此外,三种癌症的案例研究进一步证明了 MGATE 识别潜在疾病相关候选 lncRNA 的能力。源代码和补充数据可在 https://github.com/sheng-n/MGATE 上获取。