• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用进化基准测试细菌基因组系统发生工作流程的拓扑准确性。

Benchmarking the topological accuracy of bacterial phylogenomic workflows using evolution.

机构信息

Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

Department of Global Health, Amsterdam Institute for Global Health and Development, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

出版信息

Microb Genom. 2022 Mar;8(3). doi: 10.1099/mgen.0.000799.

DOI:10.1099/mgen.0.000799
PMID:35290758
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9176278/
Abstract

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed -mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of -mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, -mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

摘要

系统发育分析被广泛应用于微生物学研究,例如,基于全基因组测序数据追踪细菌爆发的进展。在实践中,将多个分析步骤(如组装、比对和系统发育推断)组合起来形成系统发育工作流程。完整的系统发育工作流程的准确性综合基准测试是缺乏的。为了对不同的系统发育工作流程进行基准测试,我们模拟了在广泛的进化模型下细菌的进化,改变了替代、插入、缺失、基因复制、基因丢失和水平基因转移事件的相对速率。生成的数据集对应于细菌物种内通常观察到的遗传多样性(≥95%平均核苷酸同一性)。我们对每个模拟重复了三次,以评估可重复性。总共使用 8 个不同的模拟数据集对 19 个不同的系统发育工作流程进行了基准测试。我们发现,最近开发的 kSNP 和 ska 等 -mer 比对方法与参考映射具有相似的准确性。-mer 比对方法的高精度可以用这些方法可以比对的基因组的大分数来解释,相对于其他方法。我们还发现,组装算法的选择会影响系统发育重建的准确性,使用 SPAdes 或 skesa 的工作流程优于使用 Velvet 的工作流程。最后,我们发现,系统发育基准测试的结果在重复之间高度可变。我们得出的结论是,对于基因组系统发育重建,-mer 比对方法是物种水平参考映射的一个相关替代方法,尤其是在没有合适的参考基因组的情况下。我们表明,基因组组装的准确性是准确的基因组系统发育重建所需的一个被低估的参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/1aed6ebc1d98/mgen-8-0799-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/bc5d0e7f464c/mgen-8-0799-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/b77024c8190c/mgen-8-0799-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/b244a47e107b/mgen-8-0799-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/a9ac9f9b31b5/mgen-8-0799-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/1aed6ebc1d98/mgen-8-0799-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/bc5d0e7f464c/mgen-8-0799-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/b77024c8190c/mgen-8-0799-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/b244a47e107b/mgen-8-0799-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/a9ac9f9b31b5/mgen-8-0799-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187c/9176278/1aed6ebc1d98/mgen-8-0799-g005.jpg

相似文献

1
Benchmarking the topological accuracy of bacterial phylogenomic workflows using evolution.使用进化基准测试细菌基因组系统发生工作流程的拓扑准确性。
Microb Genom. 2022 Mar;8(3). doi: 10.1099/mgen.0.000799.
2
Analytical Performance Validation of Next-Generation Sequencing Based Clinical Microbiology Assays Using a K-mer Analysis Workflow.使用K-mer分析工作流程对基于下一代测序的临床微生物学检测进行分析性能验证
Front Microbiol. 2020 Aug 5;11:1883. doi: 10.3389/fmicb.2020.01883. eCollection 2020.
3
Pan-genome and phylogeny of Bacillus cereus sensu lato.蜡样芽孢杆菌群的泛基因组与系统发育
BMC Evol Biol. 2017 Aug 2;17(1):176. doi: 10.1186/s12862-017-1020-1.
4
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
5
Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。
Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.
6
phyBWT2: phylogeny reconstruction via eBWT positional clustering.phyBWT2:通过增强型Burrows-Wheeler变换位置聚类进行系统发育重建
Algorithms Mol Biol. 2023 Aug 3;18(1):11. doi: 10.1186/s13015-023-00232-4.
7
Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.使用 Illumina 和 Oxford Nanopore 测序对细菌病原体进行基因组分析的混合组装方法的基准测试。
BMC Genomics. 2020 Sep 14;21(1):631. doi: 10.1186/s12864-020-07041-8.
8
Alignment-free inference of hierarchical and reticulate phylogenomic relationships.基于无比对的方法推断系统发生的分支和网状结构关系。
Brief Bioinform. 2019 Mar 22;20(2):426-435. doi: 10.1093/bib/bbx067.
9
KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis.KITSUNE:一种用于为无比对系统发育基因组分析确定经验最优k-mer长度的工具。
Front Bioeng Biotechnol. 2020 Sep 23;8:556413. doi: 10.3389/fbioe.2020.556413. eCollection 2020.
10
Benchmarking different approaches for Norovirus genome assembly in metagenome samples.比较宏基因组样品中诺如病毒基因组组装的不同方法。
BMC Genomics. 2021 Nov 24;22(1):849. doi: 10.1186/s12864-021-08067-2.

引用本文的文献

1
KmerAperture: Retaining k-mer synteny for alignment-free extraction of core and accessory differences between bacterial genomes.KmerAperture:用于在无比对的情况下提取细菌基因组核心和辅助差异的 k-mer 同序性保留。
PLoS Genet. 2024 Apr 29;20(4):e1011184. doi: 10.1371/journal.pgen.1011184. eCollection 2024 Apr.

本文引用的文献

1
Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。
F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.
2
Analysis of a small outbreak of Shiga toxin-producing O157:H7 using long-read sequencing.使用长读长测序技术对产志贺毒素的O157:H7小范围暴发进行分析。
Microb Genom. 2021 Mar;7(3). doi: 10.1099/mgen.0.000545. Epub 2021 Mar 8.
3
SuperPlotsOfData-a web app for the transparent display and quantitative comparison of continuous data from different conditions.SuperPlotsOfData—一个用于透明显示和定量比较来自不同条件的连续数据的网络应用程序。
Mol Biol Cell. 2021 Mar 15;32(6):470-474. doi: 10.1091/mbc.E20-09-0583. Epub 2021 Jan 21.
4
jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator.狼兔:一种快速、通用的系统发育基因组学和高通量测序模拟程序。
Mol Ecol Resour. 2020 Jul;20(4):1132-1140. doi: 10.1111/1755-0998.13173. Epub 2020 May 20.
5
Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes.基于模拟基因组和表型对细菌全基因组关联研究方法进行基准测试。
Microb Genom. 2020 Mar;6(3). doi: 10.1099/mgen.0.000337.
6
Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.基因组多样性影响细菌单核苷酸多态性 calling 管道的准确性。
Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa007.
7
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.
8
PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria.PIRATE:一种快速且可扩展的细菌分歧直系同源聚类泛基因组工具包。
Gigascience. 2019 Oct 1;8(10). doi: 10.1093/gigascience/giz119.
9
Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages.僵尸:一种系统发育模拟器,用于模拟树、基因组和序列,同时考虑到已灭绝的谱系。
Bioinformatics. 2020 Feb 15;36(4):1286-1288. doi: 10.1093/bioinformatics/btz710.
10
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.高通量 ANI 分析 9 万余组原核基因组揭示了清晰的物种界限。
Nat Commun. 2018 Nov 30;9(1):5114. doi: 10.1038/s41467-018-07641-9.