School of Biological Sciences, Seoul National University, Seoul 151-742, Republic of Korea.
Interdisciplinary Program in Bioinformatics and Bioinformatics Institute, Seoul National University, Seoul 151-742, Republic of Korea.
Int J Syst Evol Microbiol. 2014 Feb;64(Pt 2):346-351. doi: 10.1099/ijs.0.059774-0.
Among available genome relatedness indices, average nucleotide identity (ANI) is one of the most robust measurements of genomic relatedness between strains, and has great potential in the taxonomy of bacteria and archaea as a substitute for the labour-intensive DNA-DNA hybridization (DDH) technique. An ANI threshold range (95-96%) for species demarcation had previously been suggested based on comparative investigation between DDH and ANI values, albeit with rather limited datasets. Furthermore, its generality was not tested on all lineages of prokaryotes. Here, we investigated the overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla to see whether the suggested range can be applied to all species. There was an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95-96% ANI. We went on to determine which level of 16S rRNA gene sequence similarity corresponds to the currently accepted ANI threshold for species demarcation using over one million comparisons. A twofold cross-validation statistical test revealed that 98.65% 16S rRNA gene sequence similarity can be used as the threshold for differentiating two species, which is consistent with previous suggestions (98.2-99.0%) derived from comparative studies between DDH and 16S rRNA gene sequence similarity. Our findings should be useful in accelerating the use of genomic sequence data in the taxonomy of bacteria and archaea.
在现有的基因组相关性指数中,平均核苷酸同一性 (ANI) 是衡量菌株间基因组相关性最可靠的指标之一,并且作为 DNA-DNA 杂交 (DDH) 技术的替代方法,在细菌和古菌的分类学中具有很大的潜力。以前曾根据 DDH 和 ANI 值的比较研究,提出了用于物种划分的 ANI 阈值范围(95-96%),尽管数据集相当有限。此外,它的通用性尚未在所有原核生物谱系上进行测试。在这里,我们通过比较 6787 个属于 22 个门的原核生物基因组的两两比较,研究了 ANI 值的总体分布,以确定所建议的范围是否适用于所有物种。在 ANI 值为 95-96%左右时,种内和种间关系的总体 ANI 分布有明显的区别。然后,我们使用超过 100 万次比较来确定 16S rRNA 基因序列相似性的哪个水平对应于目前接受的物种划分的 ANI 阈值。两倍交叉验证统计检验表明,98.65%的 16S rRNA 基因序列相似性可用于区分两个物种,这与以前从 DDH 和 16S rRNA 基因序列相似性的比较研究中得出的 98.2-99.0%的建议一致。我们的研究结果应该有助于加速在细菌和古菌分类学中使用基因组序列数据。