Bu DeChao, Luo HaiTao, Jiao Fei, Fang ShuangSang, Tan ChengFu, Liu ZhiYong, Zhao Yi
Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
Sci China Life Sci. 2015 Aug;58(8):787-98. doi: 10.1007/s11427-015-4881-9. Epub 2015 Jun 27.
Mammalian genomes contain tens of thousands of long non-coding RNAs (lncRNAs) that have been implicated in diverse biological processes. However, the lncRNA transcriptomes of most mammalian species have not been established, limiting the evolutionary annotation of these novel transcripts. Based on RNA sequencing data from six tissues of nine species, we built comprehensive lncRNA catalogs (4,142-42,558 lncRNAs) covering the major mammalian species. Compared to protein- coding RNAs, expression of lncRNAs exhibits striking lineage specificity. Notably, although 30%-99% human lncRNAs are conserved across different species on DNA locus level, only 20%-27% of these conserved lncRNA loci are detected to transcription, which represents a stark contrast to the proportion of conserved protein-coding genes (48%-80%). This finding provides a valuable resource for experimental scientists to study the mechanisms of lncRNAs. Moreover, we constructed lncRNA expression phylogenetic trees across nine mammals and demonstrated that lncRNA expression profiles can reliably determine phylogenic placement in a manner similar to their coding counterparts. Our data also reveal that the evolutionary rate of lncRNA expression varies among tissues and is significantly higher than those for protein-coding genes. To streamline the processes of browsing lncRNAs and detecting their evolutionary statuses, we integrate all the data produced in this study into a database named PhyloNONCODE (http://www.bioinfo.org/phyloNoncode). Our work starts to place mammalian lncRNAs in an evolutionary context and represent a rich resource for comparative and functional analyses of this critical layer of genome.
哺乳动物基因组包含数以万计的长链非编码RNA(lncRNA),它们参与了多种生物学过程。然而,大多数哺乳动物物种的lncRNA转录组尚未建立,这限制了对这些新转录本的进化注释。基于来自9个物种6个组织的RNA测序数据,我们构建了涵盖主要哺乳动物物种的综合lncRNA目录(4142 - 42558个lncRNA)。与蛋白质编码RNA相比,lncRNA的表达表现出显著的谱系特异性。值得注意的是,尽管30% - 99%的人类lncRNA在DNA位点水平上在不同物种间保守,但这些保守的lncRNA位点中只有20% - 27%被检测到转录,这与保守蛋白质编码基因的比例(48% - 80%)形成鲜明对比。这一发现为实验科学家研究lncRNA的机制提供了宝贵资源。此外,我们构建了9种哺乳动物的lncRNA表达系统发育树,并证明lncRNA表达谱能够以类似于其编码对应物的方式可靠地确定系统发育位置。我们的数据还揭示,lncRNA表达的进化速率在不同组织间有所不同,且显著高于蛋白质编码基因。为了简化浏览lncRNA和检测其进化状态的过程,我们将本研究中产生的所有数据整合到一个名为PhyloNONCODE(http://www.bioinfo.org/phyloNoncode)的数据库中。我们的工作开始将哺乳动物lncRNA置于进化背景下,并为这一关键基因组层的比较和功能分析提供了丰富资源。