Institut Pasteur, Université Paris Cité, Biodiversity and Epidemiology of Bacterial Pathogens, Paris, France.
Sorbonne Université, Collège Doctoral, Paris, France.
Mol Biol Evol. 2022 Jul 2;39(7). doi: 10.1093/molbev/msac135.
Sublineages (SLs) within microbial species can differ widely in their ecology and pathogenicity, and their precise definition is important in basic research and for industrial or public health applications. Widely accepted strategies to define SLs are currently missing, which confuses communication in population biology and epidemiological surveillance. Here, we propose a broadly applicable genomic classification and nomenclature approach for bacterial strains, using the prominent public health threat Klebsiella pneumoniae as a model. Based on a 629-gene core genome multilocus sequence typing (cgMLST) scheme, we devised a dual barcoding system that combines multilevel single linkage (MLSL) clustering and life identification numbers (LINs). Phylogenetic and clustering analyses of >7,000 genome sequences captured population structure discontinuities, which were used to guide the definition of 10 infraspecific genetic dissimilarity thresholds. The widely used 7-gene multilocus sequence typing (MLST) nomenclature was mapped onto MLSL SLs (threshold: 190 allelic mismatches) and clonal group (threshold: 43) identifiers for backwards nomenclature compatibility. The taxonomy is publicly accessible through a community-curated platform (https://bigsdb.pasteur.fr/klebsiella), which also enables external users' genomic sequences identification. The proposed strain taxonomy combines two phylogenetically informative barcode systems that provide full stability (LIN codes) and nomenclatural continuity with previous nomenclature (MLSL). This species-specific dual barcoding strategy for the genomic taxonomy of microbial strains is broadly applicable and should contribute to unify global and cross-sector collaborative knowledge on the emergence and microevolution of bacterial pathogens.
亚种(Sublineages,SLs)在微生物物种内的生态和致病性方面可能存在广泛差异,因此准确定义亚种在基础研究以及工业或公共卫生应用中都非常重要。目前,缺乏广泛接受的亚种定义策略,这导致在种群生物学和流行病学监测方面的交流存在混淆。在这里,我们以突出的公共卫生威胁菌——肺炎克雷伯氏菌(Klebsiella pneumoniae)为模型,提出了一种广泛适用于细菌菌株的基因组分类和命名方法。基于 629 个基因的核心基因组多位点序列分型(cgMLST)方案,我们设计了一种双重条形码系统,该系统结合了多层次单链接(MLSL)聚类和生命识别号码(LINs)。对超过 7000 个基因组序列的系统发育和聚类分析揭示了种群结构的不连续性,这些不连续性被用来指导 10 个种内遗传差异阈值的定义。广泛使用的 7 基因多位点序列分型(MLST)命名法被映射到 MLSL SLs(阈值:190 个等位基因不匹配)和克隆群(阈值:43)标识符上,以保持向后命名法的兼容性。该分类法通过一个社区管理的平台(https://bigsdb.pasteur.fr/klebsiella)公开提供,该平台还允许外部用户识别基因组序列。所提出的菌株分类法结合了两个具有系统发育信息的条形码系统,为 LIN 代码提供了完全的稳定性,并与之前的命名法(MLSL)保持命名连续性。这种针对微生物菌株基因组分类的种特异性双重条形码策略具有广泛的适用性,应该有助于统一全球和跨部门关于细菌病原体出现和微观进化的协作知识。