Wittouck Stijn, Wuyts Sander, Meehan Conor J, van Noort Vera, Lebeer Sarah
Research Group Environmental Ecology and Applied Microbiology, Department of Bioscience Engineering, University of Antwerp, Antwerp, Belgium.
Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.
mSystems. 2019 Sep 3;4(5):e00264-19. doi: 10.1128/mSystems.00264-19.
There are more than 200 published species within the genus complex (LGC), the majority of which have sequenced type strain genomes available. Although genome-based species delimitation cutoffs are accepted as the gold standard by the community, these are seldom actually checked for new or already published species. In addition, the availability of genome data is revealing inconsistencies in the species-level classification of many strains. We constructed a species taxonomy for the LGC based on 2,459 publicly available genomes, using a 94% core nucleotide identity cutoff. We reconciled these species with published species and subspecies names by (i) identifying genomes of type strains and (ii) comparing 16S rRNA genes of the genomes with 16S rRNA genes of type strains. We found that genomes within the LGC could be divided into 239 species that were discontinuous and exclusive. Comparison of these species to published species led to the identification of nine sets of published species that can be merged and one species that can be split. Further, we found at least eight species that constitute new, unpublished species. Finally, we reclassified 74 genomes on the species level and identified for the first time the species of 98 genomes. Overall, the current state of LGC species taxonomy is largely consistent with genome-based species delimitation cutoffs. There are, however, exceptions that should be resolved to evolve toward a taxonomy where species share a consistent diversity in terms of sequence divergence. The genus complex is a group of bacteria that constitutes an important source of strains with medical and food applications. The number of bacterial whole-genome sequences available for this taxon has been increasing rapidly in recent years. Despite this wealth of information, the species within this group are still largely defined by older techniques. Here, we constructed a completely new species-level taxonomy for the genus complex based on ∼2,500 whole-genome sequences. As a result of this effort, we found that many genomes are not classified to their correct species, and we were able to correct these. In addition, we found that some published species are abnormally large, while others are too small. Finally, we discovered at least eight completely novel species that have not been published before. Our work will help the field to evolve toward a more meaningful and complete taxonomy, based on whole-genome sequences.
在这个复合属(LGC)中有200多个已发表的物种,其中大多数都有已测序的模式菌株基因组。尽管基于基因组的物种界定阈值被该领域公认为金标准,但对于新物种或已发表的物种,这些阈值很少被实际检验。此外,基因组数据的可用性揭示了许多菌株在物种水平分类上的不一致性。我们基于2459个公开可用的基因组构建了LGC的物种分类法,使用94%的核心核苷酸同一性阈值。我们通过(i)识别模式菌株的基因组和(ii)将基因组的16S rRNA基因与模式菌株的16S rRNA基因进行比较,使这些物种与已发表的物种和亚种名称相协调。我们发现LGC中的基因组可以分为239个不连续且相互排斥的物种。将这些物种与已发表的物种进行比较,发现了九组可以合并的已发表物种和一个可以拆分的物种。此外,我们发现至少有八个物种构成了新的、未发表的物种。最后,我们在物种水平上对74个基因组进行了重新分类,并首次确定了98个基因组的物种。总体而言,LGC物种分类法的当前状态在很大程度上与基于基因组的物种界定阈值一致。然而,存在一些例外情况,应加以解决,以朝着物种在序列差异方面具有一致多样性的分类法发展。复合属是一组细菌,构成了具有医学和食品应用的菌株的重要来源。近年来,该分类单元可用的细菌全基因组序列数量一直在迅速增加。尽管有如此丰富的信息,但该组中的物种在很大程度上仍由较旧的技术定义。在这里,我们基于约2500个全基因组序列构建了一个全新的复合属物种水平分类法。通过这项工作,我们发现许多基因组没有被分类到正确的物种,并且我们能够纠正这些问题。此外,我们发现一些已发表的物种异常大,而另一些则过小。最后,我们发现了至少八个以前从未发表过的全新物种。我们的工作将有助于该领域朝着基于全基因组序列的更有意义和完整的分类法发展。