de Rosa R, Labedan B
Institut de Génétique et Microbiologie, Université Paris-Sud, Orsay, France.
Mol Biol Evol. 1998 Jan;15(1):17-27. doi: 10.1093/oxfordjournals.molbev.a025843.
We have tried to approach the nature of the last common ancestor to Haemophilus influenzae and Escherichia coli and to determine how each bacterium could have diverged from this putative organism. The approach used was exhaustive analysis of the homologous proteins coded by genes present in these bacteria, using as criteria for sequence relatedness an alignment of at least 80 amino acid residues and a PAM distance (number of accepted point mutations per 100 residues separating two sequences) below 250. Evolutionarily significant similarities were found between 1,345 H. influenzae proteins (85% of the total genome) and 3,058 E. coli. proteins (75% of the total genome), many of them belonging to families of various sizes (from 666 doublets to 35 large groups of more than 10 members). Nearly all the genes found by this approach to be duplicated in both bacteria were already duplicated in their last common ancestor. This was deduced from (1) the comparison of the respective distributions of evolutionary distances between orthologs (genes separated only by speciation events) and paralogs (genes duplicated in the same genome) and (2) the analysis of the phylogenetic trees reconstructed for each family of paralogs containing at least two members belonging to each bacterium. The distributions of the different categories of homologs show a significant loss of paralogous genes in H. influenzae (reduction proportional to the genome size), of many sequences which are still present in one copy in E. coli, and of some entire gene families. Phylogenetic trees also confirmed this recent loss of paralogous genes in H. influenzae. Thus, the genome size of the last common ancestor of these two bacteria would have been close to that of present-day E. coli, and the evolution of H. influenzae toward a parasitic life led to an important decrease in its genome size by some mechanism of streamlining. During this recent evolution, the memory of the gene order present in the last common ancestor has been blurred, but a few short conserved chromosomal fragments can still be detected in present-day E. coli and H. influenzae.
我们试图探究流感嗜血杆菌和大肠杆菌的最后共同祖先的本质,并确定每种细菌是如何从这种假定的生物体中分化出来的。所采用的方法是对这些细菌中存在的基因编码的同源蛋白质进行详尽分析,将至少80个氨基酸残基的比对以及低于250的PAM距离(每100个残基中分隔两个序列的可接受点突变数)作为序列相关性的标准。在1345种流感嗜血杆菌蛋白质(占基因组总数的85%)和3058种大肠杆菌蛋白质(占基因组总数的75%)之间发现了具有进化意义的相似性,其中许多蛋白质属于不同大小的家族(从666个成对基因到35个超过10个成员的大组)。通过这种方法发现的几乎所有在两种细菌中都重复的基因,在它们的最后共同祖先中就已经重复了。这是从以下两方面推断出来的:(1)直系同源基因(仅由物种形成事件分隔的基因)和平行同源基因(在同一基因组中重复的基因)之间进化距离各自分布的比较;(2)对为每个平行同源基因家族重建的系统发育树的分析,每个家族至少包含属于每种细菌的两个成员。不同类别的同源物分布表明,流感嗜血杆菌中平行同源基因有显著丢失(减少与基因组大小成比例),许多序列在大肠杆菌中仍以单拷贝存在,还有一些整个基因家族也有丢失。系统发育树也证实了流感嗜血杆菌中最近平行同源基因的丢失。因此,这两种细菌的最后共同祖先的基因组大小可能与当今的大肠杆菌相近,而流感嗜血杆菌向寄生生活方式的进化通过某种精简机制导致其基因组大小显著减小。在最近的进化过程中,最后共同祖先中存在的基因顺序记忆已经模糊,但在当今的大肠杆菌和流感嗜血杆菌中仍能检测到一些短的保守染色体片段。