Suppr超能文献

基因间开放阅读框作为从头起源基因诞生和蛋白质进化的基本结构模块。

Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution.

作者信息

Papadopoulos Chris, Callebaut Isabelle, Gelly Jean-Christophe, Hatin Isabelle, Namy Olivier, Renard Maxime, Lespinet Olivier, Lopes Anne

机构信息

Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France.

Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France.

出版信息

Genome Res. 2021 Dec;31(12):2303-2315. doi: 10.1101/gr.275638.121. Epub 2021 Nov 22.

Abstract

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.

摘要

非编码基因组在新生基因产生和遗传新奇性出现中发挥着重要作用。然而,非编码序列的特性如何促进新基因的产生以及塑造蛋白质的进化和结构多样性仍不清楚。因此,通过结合不同的生物信息学方法,我们对酿酒酵母所有基因间开放阅读框(ORF)编码的氨基酸序列的折叠潜力多样性进行了表征,目的是:(1)探究蛋白质组的结构状态多样性是否已存在于非编码序列中;(2)评估非编码基因组产生新型蛋白质模块的潜力,这些模块既可以产生新基因,也可以整合到现有蛋白质中,从而参与蛋白质结构多样性和进化。我们发现,大多数酵母基因间ORF编码的氨基酸序列包含蛋白质结构的基本构建块。此外,它们涵盖了典型蛋白质的多种结构状态,大多数被预测为可折叠的。然后,我们通过重建70个酵母新生基因的祖先序列,研究了新生基因产生的早期阶段,并对具有强翻译信号的基因间ORF的序列和结构特性进行了表征。这使我们能够突出决定新生基因出现的序列和结构因素。最后,我们发现新生蛋白质的折叠潜力与其一个祖先氨基酸序列之间存在很强的相关性,这反映了非编码基因组与蛋白质结构世界之间的关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6b/8647833/789bf4c1b4b3/2303f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验