National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Genome Biol Evol. 2017 Oct 1;9(10):2791-2811. doi: 10.1093/gbe/evx189.
Origin of new biological functions is a complex phenomenon ranging from single-nucleotide substitutions to the gain of new genes via horizontal gene transfer or duplication. Neofunctionalization and subfunctionalization of proteins is often attributed to the emergence of paralogs that are subject to relaxed purifying selection or positive selection and thus evolve at accelerated rates. Such phenomena potentially could be detected as anomalies in the phylogenies of the respective gene families. We developed a computational pipeline to search for such anomalies in 1,834 orthologous clusters of archaeal genes, focusing on lineage-specific subfamilies that significantly deviate from the expected rate of evolution. Multiple potential cases of neofunctionalization and subfunctionalization were identified, including some ancient, house-keeping gene families, such as ribosomal protein S10, general transcription factor TFIIB and chaperone Hsp20. As expected, many cases of apparent acceleration of evolution are associated with lineage-specific gene duplication. On other occasions, long branches in phylogenetic trees correspond to horizontal gene transfer across long evolutionary distances. Significant deceleration of evolution is less common than acceleration, and the underlying causes are not well understood; functional shifts accompanied by increased constraints could be involved. Many gene families appear to be "highly evolvable," that is, include both long and short branches. Even in the absence of precise functional predictions, this approach allows one to select targets for experimentation in search of new biology.
新生物功能的起源是一个复杂的现象,范围从单核苷酸替换到通过水平基因转移或复制获得新基因。蛋白质的新功能化和亚功能化通常归因于出现的旁系同源物,这些旁系同源物受到松弛的净化选择或正选择的影响,因此以加速的速率进化。这些现象可能会在相应基因家族的系统发育中被检测为异常。我们开发了一种计算管道,用于在 1834 个古菌基因的直系同源簇中搜索这些异常,重点是明显偏离预期进化率的谱系特异性亚家族。鉴定出了多个新功能化和亚功能化的潜在案例,包括一些古老的管家基因家族,如核糖体蛋白 S10、一般转录因子 TFIIB 和伴侣蛋白 Hsp20。正如预期的那样,许多进化明显加速的情况都与谱系特异性基因复制有关。在其他情况下,系统发育树中的长分支对应于跨越长进化距离的水平基因转移。进化减速比加速少见,其根本原因尚不清楚;可能涉及到功能转变伴随着约束的增加。许多基因家族似乎是“高度可进化的”,即包含长分支和短分支。即使没有精确的功能预测,这种方法也可以选择实验目标,以寻找新的生物学。