Xuan Ping, Zhao Yue, Cui Hui, Zhan Linyun, Jin Qiangguo, Zhang Tiangang, Nakaguchi Toshiya
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1480-1491. doi: 10.1109/TCBB.2022.3209571. Epub 2023 Apr 3.
Since abnormal expression of long non-coding RNAs (lncRNAs) is associated with various human diseases, identifying disease-related lncRNAs helps reveal the pathogenesis of diseases. Existing methods for lncRNA-disease association prediction mainly focus on multi-sourced data related to lncRNAs and diseases. The rich semantic information of meta-paths, composed of multiple kinds of connections between lncRNA and disease nodes, is neglected. We propose a new prediction method, MGLDA, to encode and integrate the semantics of multiple meta-paths, the global topology of heterogeneous graph, and pairwise attributes of lncRNA and disease nodes. First, a tri-layer heterogeneous graph is constructed to associate multi-sourced data across the lncRNA, disease, and miRNA nodes. Afterwards, we establish multiple meta-paths connecting the lncRNA and disease nodes to derive and denote various semantics. Each meta-path contains its specific semantics formulated by an embedding strategy, and each embedding covers local topology formed by the diverse semantic connections among the lncRNA, disease, and miRNA nodes. We construct multiple graph convolutional autoencoders (GCA) with topology-level attention to learn global and multiple local topologies from the tri-layer graph and each meta-path, respectively. The topology-level attention mechanism can learn the importance of various global and local topologies for adaptive pairwise topology fusion. Finally, a convolutional autoencoder learns the attribute representations of lncRNA-disease pairs, which integrates the learnt detailed and representative pairwise features. Experimental results show that MGLDA outperforms other state-of-the-art prediction methods in comparison and retrieves more real lncRNA-disease associations in the top-ranked candidates. The ablation study also demonstrates the important contributions of the local and global topology learning, and pairwise attribute learning. Case studies on three diseases further demonstrate MGLDA's ability to identify potential disease-related lncRNAs.
由于长链非编码RNA(lncRNA)的异常表达与多种人类疾病相关,因此识别与疾病相关的lncRNA有助于揭示疾病的发病机制。现有的lncRNA-疾病关联预测方法主要集中在与lncRNA和疾病相关的多源数据上。由lncRNA和疾病节点之间的多种连接组成的元路径的丰富语义信息被忽略了。我们提出了一种新的预测方法MGLDA,用于编码和整合多个元路径的语义、异构图的全局拓扑以及lncRNA和疾病节点的成对属性。首先,构建一个三层异构图,以关联跨越lncRNA、疾病和miRNA节点的多源数据。之后,我们建立多个连接lncRNA和疾病节点的元路径,以推导和表示各种语义。每个元路径都包含由嵌入策略制定的特定语义,并且每个嵌入都涵盖了由lncRNA、疾病和miRNA节点之间不同语义连接形成的局部拓扑。我们构建了多个具有拓扑级注意力的图卷积自动编码器(GCA),分别从三层图和每个元路径中学习全局和多个局部拓扑。拓扑级注意力机制可以学习各种全局和局部拓扑对于自适应成对拓扑融合的重要性。最后,一个卷积自动编码器学习lncRNA-疾病对的属性表示,该表示整合了学习到的详细且具有代表性的成对特征。实验结果表明,MGLDA在比较中优于其他现有最先进的预测方法,并且在排名靠前的候选者中检索到更多真实的lncRNA-疾病关联。消融研究还证明了局部和全局拓扑学习以及成对属性学习的重要贡献。对三种疾病的案例研究进一步证明了MGLDA识别潜在疾病相关lncRNA的能力。