Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France.
Methods Mol Biol. 2022;2405:63-82. doi: 10.1007/978-1-0716-1855-4_3.
Recent studies attribute a central role to the noncoding genome in the emergence of novel genes. The widespread transcription of noncoding regions and the pervasive translation of the resulting RNAs offer to the organisms a vast reservoir of novel peptides. Although the majority of these peptides are anticipated as deleterious or neutral, and thereby expected to be degraded right away or short-lived in evolutionary history, some of them can confer an advantage to the organism. The latter can be further subjected to natural selection and be established as novel genes. In any case, characterizing the structural properties of these pervasively translated peptides is crucial to understand (1) their impact on the cell and (2) how some of these peptides, derived from presumed noncoding regions, can give rise to structured and functional de novo proteins. Therefore, we present a protocol that aims to explore the potential of a genome to produce novel peptides. It consists in annotating all the open reading frames (ORFs) of a genome (i.e., coding and noncoding ones) and characterizing the fold potential and other structural properties of their corresponding potential peptides. Here, we apply our protocol to a small genome and show how to apply it to very large genomes. Finally, we present a case study which aims to probe the fold potential of a set of 721 translated ORFs in mouse lncRNAs, identified with ribosome profiling experiments. Interestingly, we show that the distribution of their fold potential is different from that of the nontranslated lncRNAs and more generally from the other noncoding ORFs of the mouse.
最近的研究表明,非编码基因组在新基因的出现中起着核心作用。非编码区域的广泛转录和由此产生的 RNA 的普遍翻译为生物体提供了大量新的肽的储备。尽管这些肽的大多数被预期是有害的或中性的,因此预计在进化历史中会立即被降解或寿命短暂,但其中一些肽可以赋予生物体优势。后者可以进一步受到自然选择的影响,并被确立为新的基因。在任何情况下,对这些普遍翻译的肽的结构特性进行特征描述对于理解(1)它们对细胞的影响以及(2)这些肽中的一些如何能够从假定的非编码区域产生结构和功能的从头蛋白质至关重要。因此,我们提出了一个旨在探索基因组产生新肽的潜力的方案。它包括注释基因组中的所有开放阅读框(ORFs)(即编码和非编码的),并对其相应潜在肽的折叠潜力和其他结构特性进行特征描述。在这里,我们将我们的方案应用于一个小基因组,并展示如何将其应用于非常大的基因组。最后,我们提出了一个案例研究,旨在探测一组用核糖体图谱实验鉴定的 721 个翻译的 ORF 在小鼠 lncRNAs 中的折叠潜力。有趣的是,我们表明,它们的折叠潜力分布与未翻译的 lncRNAs 的分布不同,并且更一般地与小鼠的其他非编码 ORFs 的分布不同。