College of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China.
Bioinformatics. 2020 Aug 15;36(16):4466-4472. doi: 10.1093/bioinformatics/btaa428.
Although long non-coding RNAs (lncRNAs) have limited capacity for encoding proteins, they have been verified as biomarkers in the occurrence and development of complex diseases. Recent wet-lab experiments have shown that lncRNAs function by regulating the expression of protein-coding genes (PCGs), which could also be the mechanism responsible for causing diseases. Currently, lncRNA-related biological data are increasing rapidly. Whereas, no computational methods have been designed for predicting the novel target genes of lncRNA.
In this study, we present a graph convolutional network (GCN) based method, named DeepLGP, for prioritizing target PCGs of lncRNA. First, gene and lncRNA features were selected, these included their location in the genome, expression in 13 tissues and miRNA-mediated lncRNA-gene pairs. Next, GCN was applied to convolve a gene interaction network for encoding the features of genes and lncRNAs. Then, these features were used by the convolutional neural network for prioritizing target genes of lncRNAs. In 10-cross validations on two independent datasets, DeepLGP obtained high area under curves (0.90-0.98) and area under precision-recall curves (0.91-0.98). We found that lncRNA pairs with high similarity had more overlapped target genes. Further experiments showed that genes targeted by the same lncRNA sets had a strong likelihood of causing the same diseases, which could help in identifying disease-causing PCGs.
https://github.com/zty2009/LncRNA-target-gene.
Supplementary data are available at Bioinformatics online.
尽管长非编码 RNA(lncRNA)的编码蛋白质能力有限,但它们已被验证为复杂疾病发生和发展的生物标志物。最近的湿实验表明,lncRNA 通过调节蛋白编码基因(PCG)的表达发挥作用,这也可能是导致疾病的机制。目前,lncRNA 相关的生物数据正在迅速增加。然而,目前还没有设计用于预测 lncRNA 新靶基因的计算方法。
在这项研究中,我们提出了一种基于图卷积网络(GCN)的方法,称为 DeepLGP,用于优先考虑 lncRNA 的靶标 PCG。首先,选择基因和 lncRNA 特征,包括它们在基因组中的位置、在 13 种组织中的表达以及 miRNA 介导的 lncRNA-基因对。接下来,应用 GCN 卷积基因相互作用网络以编码基因和 lncRNA 的特征。然后,这些特征被卷积神经网络用于优先考虑 lncRNA 的靶基因。在两个独立数据集的 10 次交叉验证中,DeepLGP 获得了高的曲线下面积(0.90-0.98)和精度-召回曲线下面积(0.91-0.98)。我们发现具有高相似度的 lncRNA 对具有更多重叠的靶基因。进一步的实验表明,同一 lncRNA 集靶向的基因具有引起相同疾病的强烈可能性,这有助于识别致病 PCG。
https://github.com/zty2009/LncRNA-target-gene。
补充数据可在 Bioinformatics 在线获取。