IEEE J Biomed Health Inform. 2022 Jun;26(6):2839-2849. doi: 10.1109/JBHI.2021.3130110. Epub 2022 Jun 3.
Abnormal expressions of long non-coding RNAs (lncRNAs) are associated with various human diseases. Identifying disease-related lncRNAs can help clarify complex disease pathogeneses. The latest methods for lncRNA-disease association prediction rely on diverse data about lncRNAs and diseases. These methods, however, cannot adequately integrate the neighbour topological information of lncRNA and disease nodes. Moreover, more intrinsic features of lncRNA-disease node pairs can be explored to better predict their latent associations. We developed a novel method, named GTAN, to predict the association propensities between lncRNAs and diseases. GTAN integrates various information about lncRNAs and diseases, and exploits neighbour topology and attribute representations of a pair of lncRNA-disease nodes. We adopted in GTAN a graph neural network architecture with three attention mechanisms and multi-layer convolutional neural networks. First, a neighbour-level self-attention mechanism is constructed to learn the importance of each neighbour for an interested lncRNA or disease node. Second, topology-level attention is proposed to enhance contextual dependencies among multiple local topology representations. An attention-enhanced graph neural network framework is then established to learn a topology representation of top-ranked neighbours. GTAN also has attribute-level attention to distinguish various contributions of attributes of the lncRNA-disease pair. Finally, attribute representation is learned by multi-layer CNN to integrate detailed features and representative features of the pair. Extensive experimental results demonstrated that GTAN outperformed state-of-the-art methods. The ablation studies confirmed the important contributions of three attention mechanisms. Case studies on three cancers further showed GTAN's ability in discovering potential lncRNA candidates related to diseases.
长链非编码 RNA(lncRNA)的异常表达与多种人类疾病有关。鉴定与疾病相关的 lncRNA 有助于阐明复杂的疾病发病机制。最新的 lncRNA-疾病关联预测方法依赖于关于 lncRNA 和疾病的各种数据。然而,这些方法不能充分整合 lncRNA 和疾病节点的邻居拓扑信息。此外,可以探索更多 lncRNA-疾病节点对的内在特征,以更好地预测它们的潜在关联。我们开发了一种名为 GTAN 的新方法来预测 lncRNA 和疾病之间的关联倾向。GTAN 整合了关于 lncRNA 和疾病的各种信息,并利用了 lncRNA-疾病节点对的邻居拓扑和属性表示。我们在 GTAN 中采用了具有三个注意力机制和多层卷积神经网络的图神经网络架构。首先,构建了一个邻居级别的自注意力机制,以学习每个邻居对感兴趣的 lncRNA 或疾病节点的重要性。其次,提出了拓扑级注意力,以增强多个局部拓扑表示之间的上下文依赖关系。然后建立了一个注意力增强的图神经网络框架,以学习排名靠前的邻居的拓扑表示。GTAN 还具有属性级注意力,以区分 lncRNA-疾病对的属性的各种贡献。最后,通过多层 CNN 学习属性表示,以整合对的详细特征和代表性特征。广泛的实验结果表明,GTAN 优于最先进的方法。消融研究证实了三个注意力机制的重要贡献。三种癌症的案例研究进一步表明了 GTAN 发现与疾病相关的潜在 lncRNA 候选物的能力。