School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab407.
Discovering long noncoding RNA (lncRNA)-disease associations is a fundamental and critical part in understanding disease etiology and pathogenesis. However, only a few lncRNA-disease associations have been identified because of the time-consuming and expensive biological experiments. As a result, an efficient computational method is of great importance and urgently needed for identifying potential lncRNA-disease associations. With the ability of exploiting node features and relationships in network, graph-based learning models have been commonly utilized by these biomolecular association predictions. However, the capability of these methods in comprehensively fusing node features, heterogeneous topological structures and semantic information is distant from optimal or even satisfactory. Moreover, there are still limitations in modeling complex associations between lncRNAs and diseases.
In this paper, we develop a novel heterogeneous graph attention network framework based on meta-paths for predicting lncRNA-disease associations, denoted as HGATLDA. At first, we conduct a heterogeneous network by incorporating lncRNA and disease feature structural graphs, and lncRNA-disease topological structural graph. Then, for the heterogeneous graph, we conduct multiple metapath-based subgraphs and then utilize graph attention network to learn node embeddings from neighbors of these homogeneous and heterogeneous subgraphs. Next, we implement attention mechanism to adaptively assign weights to multiple metapath-based subgraphs and get more semantic information. In addition, we combine neural inductive matrix completion to reconstruct lncRNA-disease associations, which is applied for capturing complicated associations between lncRNAs and diseases. Moreover, we incorporate cost-sensitive neural network into the loss function to tackle the commonly imbalance problem in lncRNA-disease association prediction. Finally, extensive experimental results demonstrate the effectiveness of our proposed framework.
发现长非编码 RNA(lncRNA)-疾病关联是理解疾病病因和发病机制的基础和关键部分。然而,由于耗时且昂贵的生物学实验,只有少数 lncRNA-疾病关联被确定。因此,对于识别潜在的 lncRNA-疾病关联,一种有效的计算方法是非常重要和急需的。由于具有利用网络中节点特征和关系的能力,基于图的学习模型已被广泛用于这些生物分子关联预测。然而,这些方法在全面融合节点特征、异构拓扑结构和语义信息方面的能力还远非最佳甚至令人满意。此外,在建模 lncRNA 和疾病之间复杂关联方面仍然存在局限性。
在本文中,我们开发了一种新的基于元路径的异构图注意网络框架用于预测 lncRNA-疾病关联,称为 HGATLDA。首先,我们通过合并 lncRNA 和疾病特征结构图以及 lncRNA-疾病拓扑结构图来构建一个异构网络。然后,对于异构图,我们进行多个基于元路径的子图,然后利用图注意网络从这些同构图和异构子图的邻居中学习节点嵌入。接下来,我们实施注意力机制来自适应地为多个基于元路径的子图分配权重,并获取更多的语义信息。此外,我们将神经归纳矩阵补全结合到损失函数中,以捕捉 lncRNA 和疾病之间复杂的关联。此外,我们将代价敏感神经网络纳入损失函数中,以解决 lncRNA-疾病关联预测中常见的不平衡问题。最后,广泛的实验结果证明了我们提出的框架的有效性。