利用分拆分析实现爆发基因组数据的无缝、快速和准确分析。

Seamless, rapid, and accurate analyses of outbreak genomic data using split -mer analysis.

机构信息

NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W21PG, United Kingdom.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

出版信息

Genome Res. 2024 Oct 29;34(10):1661-1673. doi: 10.1101/gr.279449.124.

DOI:10.1101/gr.279449.124

PMID:39406504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11529842/

Abstract

Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split -mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.

摘要

在病原体种群中观察到的序列变异可用于重要的公共卫生和进化基因组分析，特别是暴发分析和传播重建。鉴定这种变异通常是通过将序列读取与参考基因组对齐来实现的，但这种方法容易受到参考偏差的影响，并且需要仔细筛选所调用的基因型。需要有工具可以处理不断增长的细菌基因组数据量，提供快速的结果，但又要保持简单，以便无需经过高度训练的生物信息学家、昂贵的数据分析以及长期存储和处理大型文件，就可以使用。在这里，我们描述了分割 - 合并分析（SKA2），这是一种既支持无参考又支持基于参考的映射的方法，可使用测序读取或基因组组装快速准确地对细菌种群进行基因分型。SKA2 对密切相关的样本具有高度准确性，在暴发模拟中，与基于参考的方法相比，我们显示出优越的变异召回率，而没有假阳性。SKA2 还可以准确地将变体映射到参考基因组，并与重组检测方法结合使用，以快速重建垂直进化史。SKA2 比可比方法快许多倍，可以用于将新基因组添加到现有调用集中，允许连续使用，而无需重新分析整个数据集。由于不存在参考偏差、准确性高和稳健的实现，SKA2 有可能成为细菌基因分型的首选工具。SKA2 是用 Rust 编写的，并且作为开源软件免费提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e60d/11529842/63f510df5502/1661f01.jpg

相似文献

Seamless, rapid, and accurate analyses of outbreak genomic data using split -mer analysis.利用分拆分析实现爆发基因组数据的无缝、快速和准确分析。

Genome Res. 2024 Oct 29;34(10):1661-1673. doi: 10.1101/gr.279449.124.

Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。

Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel.利用 BioHansel 对克隆细菌病原体进行快速准确的 SNP 基因分型。

Microb Genom. 2021 Sep;7(9). doi: 10.1099/mgen.0.000651.

Rapid, reference-free identification of bacterial pathogen transmission using optimized split -mer analysis.使用优化的拆分词分析法快速、无需参考地鉴定细菌病原体传播

Microb Genom. 2025 Mar;11(3). doi: 10.1099/mgen.0.001347.

NanoCore: core-genome-based bacterial genomic surveillance and outbreak detection in healthcare facilities from Nanopore and Illumina data.NanoCore：基于核心基因组的细菌基因组监测和爆发检测，用于从 Nanopore 和 Illumina 数据的医疗保健设施中。

mSystems. 2024 Nov 19;9(11):e0108024. doi: 10.1128/msystems.01080-24. Epub 2024 Oct 7.

Fast and flexible bacterial genomic epidemiology with PopPUNK.使用 PopPUNK 进行快速灵活的细菌基因组流行病学研究。

Genome Res. 2019 Feb;29(2):304-316. doi: 10.1101/gr.241455.118. Epub 2019 Jan 24.

Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.使用 Illumina 和 Oxford Nanopore 测序对细菌病原体进行基因组分析的混合组装方法的基准测试。

BMC Genomics. 2020 Sep 14;21(1):631. doi: 10.1186/s12864-020-07041-8.

Positional bias in variant calls against draft reference assemblies.针对草图参考基因组组装的变异位点调用中的位置偏差。

BMC Genomics. 2017 Mar 28;18(1):263. doi: 10.1186/s12864-017-3637-2.

Efficient and robust search of microbial genomes via phylogenetic compression.通过系统发育压缩对微生物基因组进行高效且稳健的搜索。

Nat Methods. 2025 Apr;22(4):692-697. doi: 10.1038/s41592-025-02625-2. Epub 2025 Apr 9.

Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies.Hybracter：实现可扩展、自动化、完整和准确的细菌基因组组装。

Microb Genom. 2024 May;10(5). doi: 10.1099/mgen.0.001244.

引用本文的文献

Genomic diversity of clinically relevant bacterial pathogens from an acute care hospital in Suva, Fiji.来自斐济苏瓦一家急症医院的临床相关细菌病原体的基因组多样性。

JAC Antimicrob Resist. 2025 Jun 9;7(3):dlaf058. doi: 10.1093/jacamr/dlaf058. eCollection 2025 Jun.

Comparative genome analysis investigation of nosocomial and community-acquired cases of Legionnaires' disease caused by ST2858 and ST378.由ST2858和ST378引起的军团病医院感染病例和社区获得性病例的比较基因组分析研究

Microbiol Spectr. 2025 Jul;13(7):e0051325. doi: 10.1128/spectrum.00513-25. Epub 2025 Jun 9.

Genome-wide approaches to bacterial strain typing: a history and review of recent methodological advances.细菌菌株分型的全基因组方法：历史与近期方法学进展综述

Curr Opin Infect Dis. 2025 Aug 1;38(4):329-338. doi: 10.1097/QCO.0000000000001118. Epub 2025 Jun 12.

Are reads required? High-precision variant calling from bacterial genome assemblies.是否需要读数？从细菌基因组组装中进行高精度变异检测。

Access Microbiol. 2025 May 28;7(5). doi: 10.1099/acmi.0.001025.v3. eCollection 2025.

Multidrug-resistant Shigella flexneri outbreak affecting humans and non-human primates in New Mexico, USA.美国新墨西哥州发生影响人类和非人灵长类动物的多重耐药性福氏志贺菌疫情。

Nat Commun. 2025 May 20;16(1):4680. doi: 10.1038/s41467-025-59766-3.

Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID platform.使用基于云的开源CZ ID平台同时检测病原体和抗菌药物耐药基因。

Genome Med. 2025 May 6;17(1):46. doi: 10.1186/s13073-025-01480-2.

Integrated population clustering and genomic epidemiology with PopPIPE.利用PopPIPE进行综合人群聚类和基因组流行病学研究。

Microb Genom. 2025 Apr;11(4). doi: 10.1099/mgen.0.001404.

Reference-Free Variant Calling with Local Graph Construction with ska lo (SKA).使用ska lo（SKA）进行局部图构建的无参考变异检测

Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf077.

Investigating two decades of bacteraemia in the Gelderland area, the Netherlands, using whole-genome sequencing.利用全基因组测序技术对荷兰海尔德兰地区二十年的菌血症情况进行调查。

Microb Genom. 2025 Mar;11(3). doi: 10.1099/mgen.0.001377.

Rapid, reference-free identification of bacterial pathogen transmission using optimized split -mer analysis.使用优化的拆分词分析法快速、无需参考地鉴定细菌病原体传播

Microb Genom. 2025 Mar;11(3). doi: 10.1099/mgen.0.001347.

本文引用的文献

Building Phylogenetic Trees From Genome Sequences With kSNP4.基于 kSNP4 从基因组序列构建系统发育树。

Mol Biol Evol. 2023 Nov 3;40(11). doi: 10.1093/molbev/msad235.

fastlin: an ultra-fast program for Mycobacterium tuberculosis complex lineage typing.fastlin：一种用于结核分枝杆菌复合群谱系分型的超快程序。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad648.

Genomic epidemiology of human candidaemia isolates in a tertiary hospital.三级医院人类念珠菌血症分离株的基因组流行病学研究。

Microb Genom. 2023 Jul;9(7). doi: 10.1099/mgen.0.001047.

Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes. Themisto：一种可扩展的彩色 k-mer 索引，可用于对数十万细菌基因组进行敏感的伪比对。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i260-i269. doi: 10.1093/bioinformatics/btad233.

Split k-mer analysis compared to cgMLST and SNP-based core genome analysis for detecting transmission of vancomycin-resistant enterococci: results from routine outbreak analyses across different hospitals and hospitals networks in Berlin, Germany.基于 k- 分馏分析与 cgMLST 和 SNP 核心基因组分析比较，检测耐万古霉素肠球菌传播：德国柏林不同医院和医院网络常规暴发分析结果。

Microb Genom. 2023 Jan;9(1). doi: 10.1099/mgen.0.000937.

Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction.比较 R9.4.1/Kit10 和 R10/Kit12 Oxford Nanopore 流动池和化学试剂在细菌基因组重建中的应用。

Microb Genom. 2023 Jan;9(1). doi: 10.1099/mgen.0.000910.

Pseudomonas aeruginosa aggregation and Psl expression in sputum is associated with antibiotic eradication failure in children with cystic fibrosis.铜绿假单胞菌在痰液中的聚集和 Psl 表达与囊性纤维化儿童抗生素清除失败有关。

Sci Rep. 2022 Dec 12;12(1):21444. doi: 10.1038/s41598-022-25889-6.

MGnify: the microbiome sequence data analysis resource in 2023.MGnify：2023 年的微生物组序列数据分析资源。

Nucleic Acids Res. 2023 Jan 6;51(D1):D753-D759. doi: 10.1093/nar/gkac1080.

Emergence and dissemination of antimicrobial resistance in Escherichia coli causing bloodstream infections in Norway in 2002-17: a nationwide, longitudinal, microbial population genomic study.2002-2017 年挪威血流感染大肠埃希菌中抗菌药物耐药性的出现和传播：一项全国性、纵向、微生物群体基因组研究。

Lancet Microbe. 2021 Jul;2(7):e331-e341. doi: 10.1016/S2666-5247(21)00031-8. Epub 2021 May 10.

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.phastSim：用于大流行规模数据集的序列进化的高效模拟。

PLoS Comput Biol. 2022 Apr 29;18(4):e1010056. doi: 10.1371/journal.pcbi.1010056. eCollection 2022 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用分拆分析实现爆发基因组数据的无缝、快速和准确分析。

Seamless, rapid, and accurate analyses of outbreak genomic data using split -mer analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献