UMR CNRS 7138 Systématique, Adaptation, Evolution, Muséum National d'Histoire Naturelle, Paris, France.
Mol Biol Evol. 2011 Apr;28(4):1393-405. doi: 10.1093/molbev/msq323. Epub 2010 Dec 20.
Phylogenomic studies produce increasingly large phylogenetic forests of trees with patchy taxonomical sampling. Typically, prokaryotic data generate thousands of gene trees of all sizes that are difficult, if not impossible, to root. Their topologies do not match the genealogy of lineages, as they are influenced not only by duplication, losses, and vertical descent but also by lateral gene transfer (LGT) and recombination. Because this complexity in part reflects the diversity of evolutionary processes, the study of phylogenetic forests is thus a great opportunity to improve our understanding of prokaryotic evolution. Here, we show how the rich evolutionary content of such novel phylogenetic objects can be exploited through the development of new approaches designed specifically for extracting the multiple evolutionary signals present in the forest of life, that is, by slicing up trees into remarkable bits and pieces: clans, slices, and clips. We harvested a forest of 6,901 unrooted gene trees comprising up to 100 prokaryotic genomes (41 archaea and 59 bacteria) to search for evolutionary events that a species tree would not account for. We identified 1) trees and partitions of trees that reflected the lifestyle of organisms rather than their taxonomy, 2) candidate lifestyle-specific genetic modules, used by distinct unrelated organisms to adapt to the same environment, 3) gene families, nonrandomly distributed in the functional space, that were frequently exchanged between archaea and bacteria, sometimes without major changes in their sequences. Finally, 4) we reconstructed polarized networks of genetic partnerships between archaea and bacteria to describe some of the rules affecting LGT between these two Domains.
系统发生基因组学研究产生了越来越多具有块状分类采样的系统发生树森林。通常,原核生物数据会产生成千上万大小不一的基因树,这些基因树很难(如果不是不可能的话)生根。它们的拓扑结构与谱系的系统发生不一致,因为它们不仅受到复制、丢失和垂直下降的影响,还受到侧向基因转移(LGT)和重组的影响。由于这种复杂性部分反映了进化过程的多样性,因此研究系统发生树森林是一个很好的机会,可以提高我们对原核生物进化的理解。在这里,我们展示了如何通过开发专门用于从生命之林中提取多种进化信号的新方法来利用这些新颖的系统发生对象的丰富进化内容,也就是说,通过将树木切成显著的小块:族、切片和剪辑。我们收获了一个由 6901 棵无根基因树组成的森林,这些基因树最多包含 100 个原核生物基因组(41 个古菌和 59 个细菌),以搜索物种树无法解释的进化事件。我们确定了 1)反映生物体生活方式而不是其分类的树和树的分区,2)候选的生活方式特异性遗传模块,这些模块由不同的无关生物体用于适应相同的环境,3)在功能空间中非随机分布的基因家族,经常在古菌和细菌之间交换,有时序列没有重大变化。最后,4)我们重建了古菌和细菌之间的遗传伙伴关系极化网络,以描述影响这两个领域之间 LGT 的一些规则。