Bourret Jérôme, Borvető Fanni, Bravo Ignacio G
Laboratoire MIVEGEC (CNRS IRD Univ Montpellier), Centre National de la Recherche Scientifique (CNRS), Montpellier, France.
J Evol Biol. 2023 Oct;36(10):1375-1392. doi: 10.1111/jeb.14212. Epub 2023 Sep 5.
Gene paralogs are copies of an ancestral gene that appear after gene or full genome duplication. When two sister gene copies are maintained in the genome, redundancy may release certain evolutionary pressures, allowing one of them to access novel functions. Here, we focused our study on gene paralogs on the evolutionary history of the three polypyrimidine tract binding protein genes (PTBP) and their concurrent evolution of differential codon usage preferences (CUPrefs) in vertebrate species. PTBP1-3 show high identity at the amino acid level (up to 80%) but display strongly different nucleotide composition, divergent CUPrefs and, in humans and in many other vertebrates, distinct tissue-specific expression levels. Our phylogenetic inference results show that the duplication events leading to the three extant PTBP1-3 lineages predate the basal diversification within vertebrates, and genomic context analysis illustrates that local synteny has been well preserved over time for the three paralogs. We identify a distinct evolutionary pattern towards GC3-enriching substitutions in PTBP1, concurrent with enrichment in frequently used codons and with a tissue-wide expression. In contrast, PTBP2s are enriched in AT-ending, rare codons, and display tissue-restricted expression. As a result of this substitution trend, CUPrefs sharply differ between mammalian PTBP1s and the rest of PTBPs. Genomic context analysis suggests that GC3-rich nucleotide composition in PTBP1s is driven by local substitution processes, while the evidence in this direction is thinner for PTBP2-3. An actual lack of co-variation between the observed GC composition of PTBP2-3 and that of the surrounding non-coding genomic environment would raise an interrogation on the origin of CUPrefs, warranting further research on a putative tissue-specific translational selection. Finally, we communicate an intriguing trend for the use of the UUG-Leu codon, which matches the trends of AT-ending codons. Our results are compatible with a scenario in which a combination of directional mutation-selection processes would have differentially shaped CUPrefs of PTBPs in vertebrates: the observed GC-enrichment of PTBP1 in placental mammals may be linked to genomic location and to the strong and broad tissue-expression, while AT-enrichment of PTBP2 and PTBP3 would be associated with rare CUPrefs and thus, possibly to specialized spatio-temporal expression. Our interpretation is coherent with a gene subfunctionalisation process by differential expression regulation associated with the evolution of specific CUPrefs.
基因旁系同源物是在基因或全基因组复制后出现的祖先基因的拷贝。当两个姐妹基因拷贝在基因组中得以保留时,冗余可能会释放某些进化压力,使其中一个拷贝能够获得新功能。在此,我们将研究重点放在脊椎动物物种中三个聚嘧啶序列结合蛋白基因(PTBP)的进化历史及其密码子使用偏好差异(CUPrefs)的协同进化上。PTBP1 - 3在氨基酸水平上显示出高度一致性(高达80%),但在核苷酸组成、不同的CUPrefs方面表现出强烈差异,并且在人类和许多其他脊椎动物中,具有不同的组织特异性表达水平。我们的系统发育推断结果表明,导致现存的PTBP1 - 3三个谱系的复制事件早于脊椎动物的基部多样化,基因组背景分析表明,随着时间的推移,这三个旁系同源物的局部同线性得到了很好的保留。我们在PTBP1中识别出一种向富含GC3的替换的独特进化模式,同时伴随着常用密码子的富集以及全组织表达。相比之下,PTBP2富含以AT结尾的稀有密码子,并表现出组织限制性表达。由于这种替换趋势,哺乳动物的PTBP1与其他PTBP之间的CUPrefs存在显著差异。基因组背景分析表明,PTBP1中富含GC3的核苷酸组成是由局部替换过程驱动的,而PTBP2 - 3在这方面的证据则较少。PTBP2 - 3观察到的GC组成与周围非编码基因组环境的GC组成之间实际上缺乏共变,这将引发对CUPrefs起源的质疑,有必要对假定的组织特异性翻译选择进行进一步研究。最后,我们传达了一个关于UUG - Leu密码子使用的有趣趋势,它与以AT结尾的密码子趋势相匹配。我们的结果与一种情况相符,即定向突变 - 选择过程的组合可能会以不同方式塑造脊椎动物中PTBP的CUPrefs:在胎盘哺乳动物中观察到的PTBP1的GC富集可能与基因组位置以及强烈而广泛的组织表达有关,而PTBP2和PTBP3的AT富集则与罕见的CUPrefs相关,因此可能与特定的时空表达有关。我们的解释与通过与特定CUPrefs进化相关联的差异表达调控实现的基因亚功能化过程相一致。