School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20894, USA.
Nat Commun. 2018 Nov 30;9(1):5114. doi: 10.1038/s41467-018-07641-9.
A fundamental question in microbiology is whether there is continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) help address this question by facilitating high resolution taxonomic analysis of thousands of genomes from diverse phylogenetic lineages. To scale to available genomes and beyond, we present FastANI, a new method to estimate ANI using alignment-free approximate sequence mapping. FastANI is accurate for both finished and draft genomes, and is up to three orders of magnitude faster compared to alignment-based approaches. We leverage FastANI to compute pairwise ANI values among all prokaryotic genomes available in the NCBI database. Our results reveal clear genetic discontinuity, with 99.8% of the total 8 billion genome pairs analyzed conforming to >95% intra-species and <83% inter-species ANI values. This discontinuity is manifested with or without the most frequently sequenced species, and is robust to historic additions in the genome databases.
微生物学中的一个基本问题是基因组之间是否存在遗传多样性的连续统,还是存在明显的物种界限。全基因组相似性度量标准,如平均核苷酸同一性(ANI),通过促进对来自不同系统发育谱系的数千个基因组进行高分辨率分类分析,有助于解决这个问题。为了扩展到可用的基因组及更多的基因组,我们提出了 FastANI,这是一种使用无比对近似序列映射来估计 ANI 的新方法。FastANI 既适用于完成的基因组,也适用于草稿基因组,与基于比对的方法相比,速度快了三个数量级。我们利用 FastANI 计算了 NCBI 数据库中所有原核生物基因组之间的成对 ANI 值。我们的结果显示出明显的遗传不连续性,在分析的总共 80 亿个基因组对中,有 99.8%符合>95%的种内和<83%的种间 ANI 值。这种不连续性无论是否存在最常测序的物种都存在,并且对基因组数据库中的历史添加具有鲁棒性。