IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):407-416. doi: 10.1109/TCBB.2017.2704587. Epub 2017 May 16.
Aggregating evidences have shown that long non-coding RNAs (lncRNAs) generally play key roles in cellular biological processes such as epigenetic regulation, gene expression regulation at transcriptional and post-transcriptional levels, cell differentiation, and others. However, most lncRNAs have not been functionally characterized. There is an urgent need to develop computational approaches for function annotation of increasing available lncRNAs. In this article, we propose a global network-based method, KATZLGO, to predict the functions of human lncRNAs at large scale. A global network is constructed by integrating three heterogeneous networks: lncRNA-lncRNA similarity network, lncRNA-protein association network, and protein-protein interaction network. The KATZ measure is then employed to calculate similarities between lncRNAs and proteins in the global network. We annotate lncRNAs with Gene Ontology (GO) terms of their neighboring protein-coding genes based on the KATZ similarity scores. The performance of KATZLGO is evaluated on a manually annotated lncRNA benchmark and a protein-coding gene benchmark with known function annotations. KATZLGO significantly outperforms state-of-the-art computational method both in maximum F-measure and coverage. Furthermore, we apply KATZLGO to predict functions of human lncRNAs and successfully map 12,318 human lncRNA genes to GO terms.
已有大量证据表明,长链非编码 RNA(lncRNA)通常在细胞生物学过程中发挥关键作用,如表观遗传调控、转录和转录后水平的基因表达调控、细胞分化等。然而,大多数 lncRNA 的功能尚未得到充分表征。因此,迫切需要开发计算方法来对越来越多的 lncRNA 进行功能注释。在本文中,我们提出了一种基于全局网络的方法 KATZLGO,用于大规模预测人类 lncRNA 的功能。通过整合三种异构网络:lncRNA-lncRNA 相似性网络、lncRNA-蛋白质关联网络和蛋白质-蛋白质相互作用网络,构建了一个全局网络。然后,使用 KATZ 测度计算全局网络中 lncRNA 与蛋白质之间的相似性。我们根据 KATZ 相似性得分,基于邻近蛋白质编码基因的基因本体论(GO)术语对 lncRNA 进行注释。在手动注释的 lncRNA 基准和具有已知功能注释的蛋白质编码基因基准上评估了 KATZLGO 的性能。KATZLGO 在最大 F 度量和覆盖率方面均显著优于最先进的计算方法。此外,我们应用 KATZLGO 预测人类 lncRNA 的功能,并成功将 12318 个人类 lncRNA 基因映射到 GO 术语。