Li Xiangyang, Yang Zilin, Wang Zhao, Li Weipeng, Zhang Guohui, Yan Hongguang
School of Sciences, Kaili University, Kaili, China.
Bacterial Genome Data Mining and Bioinformatic Analysis Center, Kaili University, Kaili, China.
Front Microbiol. 2022 Jan 13;12:755874. doi: 10.3389/fmicb.2021.755874. eCollection 2021.
is a species complex with extremely broad phenotypic and genotypic diversity. However, very little is known about its diversity, taxonomy and phylogeny at the genomic scale. To address these issues, we systematically and comprehensively defined the taxonomy and nomenclature for this species complex and explored its genetic diversity using hundreds of sequenced genomes. By combining average nucleotide identity (ANI) evaluation and phylogenetic inference approaches, we identified 123 complex genomes covering at least six well-defined species among all sequenced genomes; of these, 25 genomes represented novel members of this species complex. ANI values of ≥∼95% and digital DNA-DNA hybridization (dDDH) values of ≥∼60% in combination with phylogenomic analysis consistently and robustly supported the division of these strains into 27 genomovars (most likely species to some extent), comprising 16 known and 11 unknown genomovars. We revealed that 12 strains had mistaken taxonomic assignments, while 16 strains without species names can be assigned to the species level within the species complex. We observed an open pan-genome of the complex comprising 13,261 gene families, among which approximately 45% gene families do not match any sequence present in the COG database, and a large proportion of accessory genes. The genome contents experienced extensive genetic gain and loss events, which may be one of the major mechanisms driving diversification within this species complex. Surprisingly, we found that the ectoine biosynthesis gene cluster () was present in all genomes of species complex strains but distributed at very low frequency (43 out of 9548) in other genomes, suggesting a possible origin of the ancestors of species complex in high-osmolarity environments. Collectively, our study highlights the potential of using whole-genome sequences to re-evaluate the current definition of the complex, shedding new light on its genomic diversity and evolutionary history.
是一个具有极其广泛表型和基因型多样性的物种复合体。然而,在基因组尺度上,对其多样性、分类学和系统发育的了解却非常少。为了解决这些问题,我们系统而全面地定义了这个物种复合体的分类学和命名法,并使用数百个测序基因组探索了其遗传多样性。通过结合平均核苷酸同一性(ANI)评估和系统发育推断方法,我们在所有测序基因组中鉴定出123个复合体基因组,涵盖至少六个明确界定的物种;其中,25个基因组代表了这个物种复合体的新成员。≥95%的ANI值和≥60%的数字DNA-DNA杂交(dDDH)值,结合系统发育基因组分析,一致且有力地支持将这些菌株划分为27个基因组变种(在某种程度上很可能是物种),包括16个已知的和11个未知的基因组变种。我们发现有12个菌株的分类学归属有误,而16个没有物种名称的菌株可以在物种复合体内被归到物种水平。我们观察到该复合体的一个开放泛基因组包含13261个基因家族,其中约45%的基因家族与COG数据库中存在的任何序列都不匹配,并且有很大比例的辅助基因。基因组内容经历了广泛的基因获得和丢失事件,这可能是驱动这个物种复合体内部多样化的主要机制之一。令人惊讶的是,我们发现ectoine生物合成基因簇在该物种复合体菌株的所有基因组中都存在,但在其他基因组中的分布频率非常低(9548个中有43个),这表明该物种复合体的祖先可能起源于高渗环境。总的来说,我们的研究突出了使用全基因组序列重新评估该复合体当前定义的潜力,为其基因组多样性和进化历史提供了新的线索。