• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于进化距离估计的EvANI基准测试工作流程。

EvANI benchmarking workflow for evolutionary distance estimation.

作者信息

Majidian Sina, Hwang Stephen, Zakeri Mohsen, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University, 3400 North Charles St., Baltimore, MD 21218, United States.

XDBio Program, Johns Hopkins University, 3400 North Charles St., Baltimore, MD 21218, United States.

出版信息

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf267.

DOI:10.1093/bib/bbaf267
PMID:40501070
Abstract

Advances in long-read sequencing technology have led to a rapid increase in high-quality genome assemblies. These make it possible to compare genome sequences across the Tree of Life, deepening our understanding of evolutionary relationships. Average nucleotide identity (ANI) is a metric for estimating the genetic similarity between two genomes, usually calculated as the mean identity of their shared genomic regions. These regions are typically found with genome aligners like Basic Local Alignment Search Tool BLAST or MUMmer. ANI has been applied to species delineation, building guide trees, and searching large sequence databases. Since computing ANI via genome alignment is computationally expensive, the field has increasingly turned to sketch-based approaches that use assumptions and heuristics to speed this up. We propose a suite of simulated and real benchmark datasets, together with a rank-correlation-based metric, to study how these assumptions and heuristics impact distance estimates. We call this evaluation framework EvANI. With EvANI, we show that ANIb is the ANI estimation algorithm that best captures tree distance, though it is also the least efficient. We show that k-mer-based approaches are extremely efficient and have consistently strong accuracy. We also show that some clades have inter-sequence distances that are best computed using multiple values of $k$, e.g. $k=10$ and $k=19$ for Chlamydiales. Finally, we highlight that approaches based on maximal exact matches may represent an advantageous compromise, achieving an intermediate level of computational efficiency while avoiding over-reliance on a single fixed k-mer length.

摘要

长读长测序技术的进步使得高质量基因组组装迅速增加。这使得跨生命之树比较基因组序列成为可能,加深了我们对进化关系的理解。平均核苷酸同一性(ANI)是一种用于估计两个基因组之间遗传相似性的指标,通常计算为它们共享基因组区域的平均同一性。这些区域通常通过诸如基本局部比对搜索工具BLAST或MUMmer等基因组比对工具找到。ANI已应用于物种划分、构建引导树和搜索大型序列数据库。由于通过基因组比对计算ANI在计算上成本高昂,该领域越来越多地转向基于草图的方法,这些方法使用假设和启发式方法来加快计算速度。我们提出了一套模拟和真实的基准数据集,以及一种基于秩相关的指标,以研究这些假设和启发式方法如何影响距离估计。我们将这个评估框架称为EvANI。通过EvANI,我们表明ANIb是最能捕捉树距离的ANI估计算法,尽管它也是效率最低的。我们表明基于k-mer的方法极其高效且具有始终很强的准确性。我们还表明,一些进化枝的序列间距离最好使用多个k值来计算,例如衣原体的k = 10和k = 19。最后,我们强调基于最大精确匹配的方法可能代表一种有利的折衷方案,在避免过度依赖单一固定k-mer长度的同时,实现中等水平的计算效率。

相似文献

1
EvANI benchmarking workflow for evolutionary distance estimation.用于进化距离估计的EvANI基准测试工作流程。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf267.
2
EvANI benchmarking workflow for evolutionary distance estimation.用于进化距离估计的EvANI基准测试工作流程。
bioRxiv. 2025 Feb 23:2025.02.23.639716. doi: 10.1101/2025.02.23.639716.
3
A large-scale evaluation of algorithms to calculate average nucleotide identity.计算平均核苷酸一致性的算法的大规模评估。
Antonie Van Leeuwenhoek. 2017 Oct;110(10):1281-1286. doi: 10.1007/s10482-017-0844-4. Epub 2017 Feb 15.
4
HyperGen: Compact and Efficient Genome Sketching using Hyperdimensional Vectors.HyperGen:使用超维向量进行紧凑且高效的基因组草图绘制
Bioinformatics. 2024 Jul 16;40(7). doi: 10.1093/bioinformatics/btae452.
5
Benchmarking the topological accuracy of bacterial phylogenomic workflows using evolution.使用进化基准测试细菌基因组系统发生工作流程的拓扑准确性。
Microb Genom. 2022 Mar;8(3). doi: 10.1099/mgen.0.000799.
6
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences.基于整个质体和整个线粒体基因组序列推断的基因组BLAST距离系统发育树。
BMC Bioinformatics. 2006 Jul 19;7:350. doi: 10.1186/1471-2105-7-350.
7
Bioinformatic genome comparisons for taxonomic and phylogenetic assignments using Aeromonas as a test case.以气单胞菌为测试案例,进行用于分类学和系统发育分析的生物信息学基因组比较。
mBio. 2014 Nov 18;5(6):e02136. doi: 10.1128/mBio.02136-14.
8
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.雅典娜:使用语言模型自动调整基于 k-mer 的基因组纠错算法。
Sci Rep. 2019 Nov 6;9(1):16157. doi: 10.1038/s41598-019-52196-4.
9
Primary orthologs from local sequence context.来自本地序列上下文的直系同源物。
BMC Bioinformatics. 2020 Feb 6;21(1):48. doi: 10.1186/s12859-020-3384-2.
10
Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。
Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

引用本文的文献

1
Movi Color: fast and accurate long-read classification with the move structure.Movi Color:利用移动结构进行快速准确的长读长分类。
bioRxiv. 2025 May 27:2025.05.22.655637. doi: 10.1101/2025.05.22.655637.

本文引用的文献

1
Ultrafast and accurate sequence alignment and clustering of viral genomes.病毒基因组的超快速且准确的序列比对和聚类
Nat Methods. 2025 May 15. doi: 10.1038/s41592-025-02701-7.
2
Mem-based pangenome indexing for k-mer queries.用于k-mer查询的基于内存的泛基因组索引
Algorithms Mol Biol. 2025 Mar 1;20(1):3. doi: 10.1186/s13015-025-00272-y.
3
Genomic divergence across the tree of life.生命之树上的基因组差异。
Proc Natl Acad Sci U S A. 2025 Mar 11;122(10):e2319389122. doi: 10.1073/pnas.2319389122. Epub 2025 Feb 27.
4
Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism.对细菌双链DNA病毒基因图谱的探索揭示了广泛镶嵌现象中的ANI差距。
mSystems. 2025 Feb 18;10(2):e0166124. doi: 10.1128/msystems.01661-24. Epub 2025 Jan 29.
5
Orthology inference at scale with FastOMA.使用FastOMA进行大规模直系同源推断。
Nat Methods. 2025 Feb;22(2):269-272. doi: 10.1038/s41592-024-02552-8. Epub 2025 Jan 3.
6
New developments for the Quest for Orthologs benchmark service.直系同源物搜索基准服务的新进展。
NAR Genom Bioinform. 2024 Dec 11;6(4):lqae167. doi: 10.1093/nargab/lqae167. eCollection 2024 Dec.
7
Quest for Orthologs in the Era of Biodiversity Genomics.生物多样性基因组学时代的同源基因探索。
Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae224.
8
When less is more: sketching with minimizers in genomics.少即是多:基因组学中的最小化器草图。
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
9
Applying rearrangement distances to enable plasmid epidemiology with pling.应用重排距离使 pling 能够进行质粒流行病学研究。
Microb Genom. 2024 Oct;10(10). doi: 10.1099/mgen.0.001300.
10
Polyphest: fast polyploid phylogeny estimation.Polyphest:快速的多倍体系统发育估计。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii20-ii28. doi: 10.1093/bioinformatics/btae390.