使用 PopPUNK 进行快速灵活的细菌基因组流行病学研究。

Fast and flexible bacterial genomic epidemiology with PopPUNK.

机构信息

Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA.

Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom.

出版信息

Genome Res. 2019 Feb;29(2):304-316. doi: 10.1101/gr.241455.118. Epub 2019 Jan 24.

DOI:10.1101/gr.241455.118

PMID:30679308

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6360808/

Abstract

The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (ulation artitioning sing ucleotide -mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length -mer comparisons are used to distinguish isolates' divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species' diverse evolutionary patterns. PopPUNK can process 10-10 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.

摘要

常规使用基因组学进行疾病监测为高分辨率细菌流行病学提供了机会。当前的全基因组聚类和多位点分型方法不能充分利用核心和辅助基因组变异，并且它们不能自动识别，也不能随后在跨越整个物种的大型数据集扩展具有显著相似分离株的聚类。在这里，我们描述了 PopPUNK（population artitioning sing ucleotide -mers），这是一种软件，实现了可扩展和可扩展的无注释和无对齐方法，用于种群分析和聚类。使用可变长度 -mer 比较来区分分离株在共享序列和基因内容中的差异，我们使用来自模拟和代表 10 个分类广泛的物种的基因组集合的数据证明了这一点，这些数据跨越了多个数量级。尽管种间差异反映了物种的不同进化模式，但仍然可以可靠地识别相同菌株的密切相关分离株之间的联系。PopPUNK 可以在单个批次中处理 10-10 个基因组，内存使用最少，运行时间比现有的基于模型的方法快 200 倍以上。随着新批次的基因组添加，菌株的聚类保持一致，而无需重新分析所有基因组。这实现了实时监测，在研究之间具有一致的聚类命名，并允许在几分钟内使用数百个基因组检测爆发。通过自动输出结果到多个平台，简化了交互式可视化和在线发布。PopPUNK 被设计为一个灵活的平台，解决了当前使用的全基因组聚类和分型方法的重要问题，并且在细菌遗传学和公共卫生研究中具有潜在的用途。

相似文献

Fast and flexible bacterial genomic epidemiology with PopPUNK.使用 PopPUNK 进行快速灵活的细菌基因组流行病学研究。

Genome Res. 2019 Feb;29(2):304-316. doi: 10.1101/gr.241455.118. Epub 2019 Jan 24.

Genealogical inference and more flexible sequence clustering using iterative-PopPUNK.利用迭代 PopPUNK 进行谱系推断和更灵活的序列聚类。

Genome Res. 2023 Jun;33(6):988-998. doi: 10.1101/gr.277395.122. Epub 2023 May 30.

Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel.利用 BioHansel 对克隆细菌病原体进行快速准确的 SNP 基因分型。

Microb Genom. 2021 Sep;7(9). doi: 10.1099/mgen.0.000651.

Comparison of gene-by-gene and genome-wide short nucleotide sequence-based approaches to define the global population structure of .基于基因和全基因组短核苷酸序列的方法比较，以定义. 的全球种群结构。

Microb Genom. 2024 Aug;10(8). doi: 10.1099/mgen.0.001278.

Annotated Whole-Genome Multilocus Sequence Typing Schema for Scalable High-Resolution Typing of Streptococcus pyogenes.注释全基因组多位点序列分型方案，用于可扩展的化脓性链球菌高分辨率分型。

J Clin Microbiol. 2022 Jun 15;60(6):e0031522. doi: 10.1128/jcm.00315-22. Epub 2022 May 9.

NanoCore: core-genome-based bacterial genomic surveillance and outbreak detection in healthcare facilities from Nanopore and Illumina data.NanoCore：基于核心基因组的细菌基因组监测和爆发检测，用于从 Nanopore 和 Illumina 数据的医疗保健设施中。

mSystems. 2024 Nov 19;9(11):e0108024. doi: 10.1128/msystems.01080-24. Epub 2024 Oct 7.

KmerAperture: Retaining k-mer synteny for alignment-free extraction of core and accessory differences between bacterial genomes.KmerAperture：用于在无比对的情况下提取细菌基因组核心和辅助差异的 k-mer 同序性保留。

PLoS Genet. 2024 Apr 29;20(4):e1011184. doi: 10.1371/journal.pgen.1011184. eCollection 2024 Apr.

A Vibrio cholerae Core Genome Multilocus Sequence Typing Scheme To Facilitate the Epidemiological Study of Cholera.一种霍乱弧菌核心基因组多位点序列分型方案，以促进霍乱的流行病学研究。

J Bacteriol. 2020 Nov 19;202(24). doi: 10.1128/JB.00086-20.

Machine learning reveals the dynamic importance of accessory sequences for outbreak clustering.机器学习揭示了辅助序列在疫情聚集性中的动态重要性。

mBio. 2025 Mar 12;16(3):e0265024. doi: 10.1128/mbio.02650-24. Epub 2025 Jan 28.

stringMLST: a fast k-mer based tool for multilocus sequence typing.字符串多位点序列分型（stringMLST）：一种基于快速k-mer的多位点序列分型工具。

Bioinformatics. 2017 Jan 1;33(1):119-121. doi: 10.1093/bioinformatics/btw586. Epub 2016 Sep 7.

引用本文的文献

Evolving genomic landscape of pediatric pneumococcus in two Canadian urban centers following conjugate vaccination.在加拿大两个城市中心实施结合疫苗接种后，儿童肺炎球菌的基因组格局变化

Front Microbiol. 2025 Aug 18;16:1642658. doi: 10.3389/fmicb.2025.1642658. eCollection 2025.

Genomic profiling of cefotaxime-resistant from Norway and Sweden reveals extensive expansion of virulent multidrug-resistant international clones.对来自挪威和瑞典的耐头孢噻肟菌株进行基因组分析发现，毒性多重耐药国际克隆株大量扩增。

Front Microbiol. 2025 Jul 29;16:1601390. doi: 10.3389/fmicb.2025.1601390. eCollection 2025.

A bacterial and viral genome catalogue from Atlantic salmon highlights diverse gut microbiome compositions at pre- and post-smolt life stages.一份来自大西洋鲑鱼的细菌和病毒基因组目录突出了其在稚鱼期前后肠道微生物群的不同组成。

Anim Microbiome. 2025 Aug 11;7(1):85. doi: 10.1186/s42523-025-00453-5.

Comparative genome analysis of Pasteurella multocida from Australian domestic animals suggests broad patterns of transmissions across multiple hosts and origins.对来自澳大利亚家畜的多杀性巴氏杆菌进行的比较基因组分析表明，其在多个宿主和来源之间存在广泛的传播模式。

PLoS One. 2025 Aug 6;20(8):e0329807. doi: 10.1371/journal.pone.0329807. eCollection 2025.

Genomic analysis and pneumococcal population dynamics across PCV implementation in South Korea, 1997-2023.1997 - 2023年韩国肺炎球菌结合疫苗实施期间的基因组分析与肺炎球菌种群动态

Microb Genom. 2025 Jul;11(7). doi: 10.1099/mgen.0.001433.

Translation Accuracy in .……中的翻译准确性

bioRxiv. 2025 Jun 11:2025.04.18.649569. doi: 10.1101/2025.04.18.649569.

skDER and CiDDER: two scalable approaches for microbial genome dereplication.skDER和CiDDER：两种用于微生物基因组去重复的可扩展方法。

Microb Genom. 2025 Jul;11(7). doi: 10.1099/mgen.0.001438.

A 69.9-kb long inverted repeat increases genome instability in a strain of .一个69.9千碱基对长的反向重复序列增加了某菌株中的基因组不稳定性。

NAR Genom Bioinform. 2025 Jun 26;7(2):lqaf085. doi: 10.1093/nargab/lqaf085. eCollection 2025 Jun.

Linkage-based ortholog refinement in bacterial pangenomes with CLARC.使用CLARC在细菌泛基因组中基于连锁的直系同源基因优化

Nucleic Acids Res. 2025 Jun 20;53(12). doi: 10.1093/nar/gkaf488.

KPop: accurate and scalable comparative analysis of microbial genomes by sequence embeddings.KPop：通过序列嵌入对微生物基因组进行准确且可扩展的比较分析。

Genome Biol. 2025 Jun 18;26(1):170. doi: 10.1186/s13059-025-03585-8.

本文引用的文献

pyseer: a comprehensive tool for microbial pangenome-wide association studies.pyseer：一种用于微生物泛基因组关联研究的综合工具。

Bioinformatics. 2018 Dec 15;34(24):4310-4312. doi: 10.1093/bioinformatics/bty539.

PANINI: Pangenome Neighbour Identification for Bacterial Populations.PANINI：用于细菌群体的泛基因组邻居识别。

Microb Genom. 2019 Apr;5(4). doi: 10.1099/mgen.0.000220. Epub 2018 Nov 22.

RhierBAPS: An R implementation of the population clustering algorithm hierBAPS.RhierBAPS：群体聚类算法hierBAPS的R语言实现。

Wellcome Open Res. 2018 Jul 30;3:93. doi: 10.12688/wellcomeopenres.14694.1. eCollection 2018.

GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens.葡萄树：可视化 100000 种细菌病原体核心基因组关系。

Genome Res. 2018 Sep;28(9):1395-1404. doi: 10.1101/gr.232397.117. Epub 2018 Jul 26.

Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study.使用细菌全基因组评估系统发育重建方法：一项基于模拟的研究

Wellcome Open Res. 2018 Mar 23;3:33. doi: 10.12688/wellcomeopenres.14265.2. eCollection 2018.

A genomic overview of the population structure of Salmonella.沙门氏菌群体结构的基因组概述。

PLoS Genet. 2018 Apr 5;14(4):e1007261. doi: 10.1371/journal.pgen.1007261. eCollection 2018 Apr.

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification.chewBBACA：一套完整的基因图谱创建和菌株鉴定工具。

Microb Genom. 2018 Mar;4(3). doi: 10.1099/mgen.0.000166. Epub 2018 Mar 15.

Bacmeta: simulator for genomic evolution in bacterial metapopulations.Bacmeta：细菌复合种群基因组进化模拟程序。

Bioinformatics. 2018 Jul 1;34(13):2308-2310. doi: 10.1093/bioinformatics/bty093.

An Assessment of Different Genomic Approaches for Inferring Phylogeny of .关于推断……系统发育的不同基因组方法的评估

Front Microbiol. 2017 Nov 29;8:2351. doi: 10.3389/fmicb.2017.02351. eCollection 2017.

A RESTful application programming interface for the PubMLST molecular typing and genome databases.一个用于 PubMLST 分子分型和基因组数据库的 RESTful 应用程序编程接口。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax060.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 PopPUNK 进行快速灵活的细菌基因组流行病学研究。

Fast and flexible bacterial genomic epidemiology with PopPUNK.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献