Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
Development. 2013 Jul;140(13):2828-34. doi: 10.1242/dev.098343. Epub 2013 May 22.
Large-scale genomics and computational approaches have identified thousands of putative long non-coding RNAs (lncRNAs). It has been controversial, however, as to what fraction of these RNAs is truly non-coding. Here, we combine ribosome profiling with a machine-learning approach to validate lncRNAs during zebrafish development in a high throughput manner. We find that dozens of proposed lncRNAs are protein-coding contaminants and that many lncRNAs have ribosome profiles that resemble the 5' leaders of coding RNAs. Analysis of ribosome profiling data from embryonic stem cells reveals similar properties for mammalian lncRNAs. These results clarify the annotation of developmental lncRNAs and suggest a potential role for translation in lncRNA regulation. In addition, our computational pipeline and ribosome profiling data provide a powerful resource for the identification of translated open reading frames during zebrafish development.
大规模基因组学和计算方法已经鉴定出数千个假定的长非编码 RNA(lncRNA)。然而,这些 RNA 中有多少是真正的非编码的,这一直存在争议。在这里,我们结合核糖体图谱和机器学习方法,以高通量的方式验证斑马鱼发育过程中的 lncRNA。我们发现,数十种被提议的 lncRNA 是蛋白质编码的污染物,并且许多 lncRNA 的核糖体图谱与编码 RNA 的 5' 先导相似。来自胚胎干细胞的核糖体图谱数据分析揭示了哺乳动物 lncRNA 的类似特性。这些结果阐明了发育 lncRNA 的注释,并暗示了翻译在 lncRNA 调控中的潜在作用。此外,我们的计算管道和核糖体图谱数据为在斑马鱼发育过程中鉴定翻译的开放阅读框提供了一个强大的资源。