Nuffield Department of Population Health, University of Oxford, Oxford, UK.
Department of Biology, University of Oxford, Oxford, UK.
Microb Genom. 2024 Aug;10(8). doi: 10.1099/mgen.0.001280.
Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonization, disease, antimicrobial resistance and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using the previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30 976 genomes and contextual data for carriage and disease pneumococci recovered between 1916 and 2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.
研究主要细菌病原体的基因组流行病学对于理解传播、进化、定植、疾病、抗生素耐药性和疫苗影响至关重要。此外,最近大量的细菌全基因组序列的积累增强了开发稳健的全基因组分型方案的能力,以定义细菌种群的整体结构及其内部的谱系。利用之前发表的数据,我们开发了肺炎球菌基因组文库(PGL),这是一个经过策展的数据集,包含 30976 个基因组以及 1916 年至 2018 年间在 82 个国家采集的携带和引起疾病的肺炎球菌的背景数据。我们利用 PGL 的规模和多样性,开发了一个由 1222 个基因座组成的核心基因组多位点序列分型(cgMLST)方案。最后,我们使用多层次单链接聚类,根据等位基因相似性阈值将肺炎球菌分层为层次聚类,并使用分类生命识别号码(LIN)条形码系统对其进行定义。PGL、cgMLST 方案和 LIN 条形码代表了一个高质量的基因组资源和用于肺炎球菌群体分析的精细聚类方法,支持该主要全球病原体的基因组流行病学和监测。