Konstantinidis Konstantinos T, Ramette Alban, Tiedje James M
Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, USA.
Appl Environ Microbiol. 2006 Nov;72(11):7286-93. doi: 10.1128/AEM.01398-06. Epub 2006 Sep 15.
Phylogenetic sequence analysis of single or multiple genes has dominated the study and census of the genetic diversity among closely related bacteria. It remains unclear, however, how the results based on a few genes in the genome correlate with whole-genome-based relatedness and what genes (if any) best reflect whole-genome-level relatedness and hence should be preferentially used to economize on cost and to improve accuracy. We show here that phylogenies of closely related organisms based on the average nucleotide identity (ANI) of their shared genes correspond accurately to phylogenies based on state-of-the-art analysis of their whole-genome sequences. We use ANI to evaluate the phylogenetic robustness of every gene in the genome and show that almost all core genes, regardless of their functions and positions in the genome, offer robust phylogenetic reconstruction among strains that show 80 to 95% ANI (16S rRNA identity, >98.5%). Lack of elapsed time and, to a lesser extent, horizontal transfer and recombination make the selection of genes more critical for applications that target the intraspecies level, i.e., strains that show >95% ANI according to current standards. A much more accurate phylogeny for the Escherichia coli group was obtained based on just three best-performing genes according to our analysis compared to the concatenated alignment of eight genes that are commonly employed for phylogenetic purposes in this group. Our results are reproducible within the Salmonella, Burkholderia, and Shewanella groups and therefore are expected to have general applicability for microevolution studies, including metagenomic surveys.
单基因或多基因的系统发育序列分析一直主导着亲缘关系较近的细菌间遗传多样性的研究和普查。然而,基于基因组中少数基因的结果与基于全基因组的亲缘关系如何关联,以及哪些基因(如果有的话)能最好地反映全基因组水平的亲缘关系,从而应优先用于节约成本和提高准确性,目前仍不清楚。我们在此表明,基于其共享基因的平均核苷酸同一性(ANI)构建的亲缘关系较近的生物体的系统发育树,与基于其全基因组序列的最新分析构建的系统发育树精确对应。我们使用ANI来评估基因组中每个基因的系统发育稳健性,并表明几乎所有核心基因,无论其在基因组中的功能和位置如何,都能在显示80%至95%ANI(16S rRNA同一性,>98.5%)的菌株间提供稳健的系统发育重建。时间推移的缺乏,以及在较小程度上水平转移和重组的缺乏,使得基因选择对于针对种内水平的应用(即根据当前标准显示>95%ANI的菌株)更为关键。根据我们的分析,与该组中通常用于系统发育研究的八个基因的串联比对相比,仅基于三个表现最佳的基因就获得了大肠杆菌组更为准确的系统发育树。我们的结果在沙门氏菌、伯克霍尔德氏菌和希瓦氏菌组内具有可重复性,因此预计对包括宏基因组学调查在内的微观进化研究具有普遍适用性。