College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China.
Brief Bioinform. 2017 Mar 1;18(2):236-249. doi: 10.1093/bib/bbw015.
Long noncoding RNAs (lncRNAs) are emerging as a class of important regulators participating in various biological functions and disease processes. With the widespread application of next-generation sequencing technologies, large numbers of lncRNAs have been identified, producing plenty of lncRNA annotation resources in different contexts. However, at present, we lack a comprehensive overview of these lncRNA annotation resources. In this study, we reviewed 24 currently available lncRNA annotation resources referring to > 205 000 lncRNAs in over 50 tissues and cell lines. We characterized these annotation resources from different aspects, including exon structure, expression, histone modification and function. We found many distinct properties among these annotation resources. Especially, these resources showed diverse chromatin signatures, remarkable tissue and cell type dependence and functional specificity. Our results suggested the incompleteness and complementarity of current lncRNA annotations and the necessity of integration of multiple resources to comprehensively characterize lncRNAs. Finally, we developed 'LNCat' (lncRNA atlas, freely available at http://biocc.hrbmu.edu.cn/LNCat/), a user-friendly database that provides a genome browser of lncRNA structures, visualization of different resources from multiple angles and download of different combinations of lncRNA annotations, and supports rapid exploration, comparison and integration of lncRNA annotation resources. Overall, our study provides a comprehensive comparison of numerous lncRNA annotations, and can facilitate understanding of lncRNAs in human disease.
长链非编码 RNA(lncRNA)作为一类重要的调控因子,参与多种生物功能和疾病过程。随着新一代测序技术的广泛应用,大量的 lncRNA 被鉴定出来,在不同的背景下产生了大量的 lncRNA 注释资源。然而,目前我们缺乏对这些 lncRNA 注释资源的全面了解。在本研究中,我们综述了 24 种现有的 lncRNA 注释资源,这些资源涉及超过 50 种组织和细胞系中的超过 205000 个 lncRNA。我们从不同方面对这些注释资源进行了描述,包括外显子结构、表达、组蛋白修饰和功能。我们发现这些注释资源之间存在许多不同的特性。特别是,这些资源表现出不同的染色质特征、显著的组织和细胞类型依赖性以及功能特异性。我们的研究结果表明,当前的 lncRNA 注释存在不完整性和互补性,需要整合多种资源来全面描述 lncRNAs。最后,我们开发了 'LNCat'(lncRNA 图谱,可在 http://biocc.hrbmu.edu.cn/LNCat/ 免费获取),这是一个用户友好的数据库,提供了 lncRNA 结构的基因组浏览器、从多个角度可视化不同资源以及下载不同组合的 lncRNA 注释,并支持快速探索、比较和整合 lncRNA 注释资源。总的来说,我们的研究提供了对大量 lncRNA 注释的全面比较,有助于理解人类疾病中的 lncRNA。