Department of Physiology, Anatomy, and Genetics, Le Gros Clark Building South Parks Road, University of Oxford, Oxford OX1 3QX, UK.
Genome Biol. 2010;11(7):R72. doi: 10.1186/gb-2010-11-7-r72. Epub 2010 Jul 12.
Long considered to be the building block of life, it is now apparent that protein is only one of many functional products generated by the eukaryotic genome. Indeed, more of the human genome is transcribed into noncoding sequence than into protein-coding sequence. Nevertheless, whilst we have developed a deep understanding of the relationships between evolutionary constraint and function for protein-coding sequence, little is known about these relationships for non-coding transcribed sequence. This dearth of information is partially attributable to a lack of established non-protein-coding RNA (ncRNA) orthologs among birds and mammals within sequence and expression databases.
Here, we performed a multi-disciplinary study of four highly conserved and brain-expressed transcripts selected from a list of mouse long intergenic noncoding RNA (lncRNA) loci that generally show pronounced evolutionary constraint within their putative promoter regions and across exon-intron boundaries. We identify some of the first lncRNA orthologs present in birds (chicken), marsupial (opossum), and eutherian mammals (mouse), and investigate whether they exhibit conservation of brain expression. In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable.
The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla. Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.
长期以来,蛋白质一直被认为是生命的基石,但现在显然,蛋白质只是真核生物基因组产生的众多功能产物之一。事实上,人类基因组中被转录为非编码序列的部分多于编码蛋白质的序列。尽管我们已经深入了解了蛋白质编码序列的进化约束与功能之间的关系,但对于非编码转录序列,我们知之甚少。这种信息的缺乏部分归因于在序列和表达数据库中缺乏在鸟类和哺乳动物之间建立的非蛋白编码 RNA(ncRNA)直系同源物。
在这里,我们对从一组小鼠长基因间非编码 RNA(lncRNA)基因座中选择的四个高度保守且在大脑中表达的转录本进行了多学科研究,这些转录本通常在其假定的启动子区域和外显子-内含子边界内表现出明显的进化约束。我们鉴定了一些在鸟类(鸡)、有袋动物(负鼠)和真兽类哺乳动物(鼠)中存在的第一个 lncRNA 直系同源物,并研究了它们是否表现出大脑表达的保守性。与传统的蛋白质编码基因不同,这些非编码基因的序列、转录起始位点、外显子结构和长度都高度可变。
如果 lncRNA 仅限于密切相关的门,则它们的生物学相关性将受到高度质疑。相反,它们在不同的羊膜动物中的保存、它们在外显子结构上的明显保守性以及它们在胚胎和早期产后阶段大脑表达模式的相似性都表明这些是功能性 RNA 分子,其中一些在脊椎动物大脑发育中具有作用。