PyOrthoANI、PyFastANI和Pyskani：一套用于计算平均核苷酸一致性的Python库。

PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity.

作者信息

Larralde Martin, Zeller Georg, Carroll Laura M

机构信息

Structural and Computational Biology Unit, EMBL, 69117 Heidelberg, Germany.

Leiden University Center for Infectious Diseases (LUCID), Leiden University Medical Center, 2333ZA Leiden, Netherlands.

出版信息

NAR Genom Bioinform. 2025 Jul 11;7(3):lqaf095. doi: 10.1093/nargab/lqaf095. eCollection 2025 Sep.

DOI:10.1093/nargab/lqaf095

PMID:40657423

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12246781/

Abstract

The average nucleotide identity (ANI) metric has become the gold standard for prokaryotic species delineation in the genomics era. The most popular ANI algorithms are available as command-line tools and/or web applications, making it inconvenient to incorporate them into bioinformatic workflows, which utilize the popular Python programming language. Here, we present PyOrthoANI, PyFastANI, and Pyskani, Python libraries for three popular ANI computation methods. ANI values produced by PyOrthoANI, PyFastANI, and Pyskani are virtually identical to those produced by OrthoANI, FastANI, and skani, respectively (adjusted 0.999). Compared to OrthoANI, PyOrthoANI is, on average, 3× faster per genome, while PyFastANI has multithreading support for single queries. All three libraries integrate seamlessly with BioPython, making it easy and convenient to use, compare, and benchmark popular ANI algorithms within Python-based bioinformatic workflows, software programs, and notebooks. Each library is available as part of the Python Package Index repository under the open-source MIT license, with source code available via GitHub (PyOrthoANI, https://github.com/althonos/orthoani; PyFastANI, https://github.com/althonos/pyfastani; Pyskani, https://github.com/althonos/pyskani).

摘要

在基因组学时代，平均核苷酸一致性（ANI）指标已成为原核生物物种划分的金标准。最流行的ANI算法以命令行工具和/或网络应用程序的形式提供，这使得将它们纳入利用流行的Python编程语言的生物信息工作流程变得不方便。在这里，我们展示了用于三种流行ANI计算方法的Python库PyOrthoANI、PyFastANI和Pyskani。由PyOrthoANI、PyFastANI和Pyskani产生的ANI值实际上分别与由OrthoANI、FastANI和skani产生的值相同（调整后 0.999）。与OrthoANI相比，PyOrthoANI平均每个基因组快3倍，而PyFastANI对单个查询具有多线程支持。所有这三个库都与BioPython无缝集成，使得在基于Python的生物信息工作流程、软件程序和笔记本中使用、比较和基准测试流行的ANI算法变得轻松便捷。每个库都作为Python包索引存储库的一部分，在开源的麻省理工学院许可下可用，源代码可通过GitHub获取（PyOrthoANI，https://github.com/althonos/orthoani；PyFastANI，https://github.com/althonos/pyfastani；Pyskani，https://github.com/althonos/pyskani）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59b6/12246781/12f667a69d81/lqaf095fig1.jpg

相似文献

PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity.PyOrthoANI、PyFastANI和Pyskani：一套用于计算平均核苷酸一致性的Python库。

NAR Genom Bioinform. 2025 Jul 11;7(3):lqaf095. doi: 10.1093/nargab/lqaf095. eCollection 2025 Sep.

Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks.大规模技术的多目标上下文引导共识，用于基因调控网络推断。

Comput Biol Med. 2024 Sep;179:108850. doi: 10.1016/j.compbiomed.2024.108850. Epub 2024 Jul 15.

SeuratIntegrate: an R package to facilitate the use of integration methods with Seurat.SeuratIntegrate：一个R软件包，便于在Seurat中使用整合方法。

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf358.

Improving the usability of open health service delivery simulation models using Python and web apps.使用Python和网络应用程序提高开放式医疗服务提供模拟模型的可用性。

NIHR Open Res. 2023 Dec 15;3:48. doi: 10.3310/nihropenres.13467.1. eCollection 2023.

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.SAKit：一种用于鉴定由大尺度和小尺度变异事件产生的新型蛋白质的一体化分析管道。

J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.

Differential expression analysis with inmoose, the integrated multi-omic open-source environment in Python.使用inmoose进行差异表达分析，inmoose是Python中的集成多组学开源环境。

BMC Bioinformatics. 2025 Jun 23;26(1):160. doi: 10.1186/s12859-025-06180-7.

Heron, a Knowledge Graph editor for intuitive implementation of Python-based experimental pipelines.Heron，一款用于直观实现基于Python的实验管道的知识图谱编辑器。

Elife. 2025 Jul 16;13:RP91915. doi: 10.7554/eLife.91915.

Cloud-based introduction to BASH programming for biologists.基于云的生物学 BASH 编程入门。

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment.Alevin-fry-atac可使用虚拟颜色实现单细胞ATAC-seq数据的快速且节省内存的映射，以进行准确的基因组伪比对。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i237-i245. doi: 10.1093/bioinformatics/btaf234.

本文引用的文献

Average nucleotide identity-based strain grouping allows identification of strain-specific genes in the pangenome.基于平均核苷酸同源性的菌株分组允许在泛基因组中识别菌株特异性基因。

mSystems. 2024 Jul 23;9(7):e0014324. doi: 10.1128/msystems.00143-24. Epub 2024 Jun 27.

An ANI gap within bacterial species that advances the definitions of intra-species units.种内 ANI 差距推进了种内单位的定义。

mBio. 2024 Jan 16;15(1):e0269623. doi: 10.1128/mbio.02696-23. Epub 2023 Dec 12.

Fast and robust metagenomic sequence comparison through sparse chaining with skani.通过使用 skani 进行稀疏链接实现快速稳健的宏基因组序列比较。

Nat Methods. 2023 Nov;20(11):1661-1665. doi: 10.1038/s41592-023-02018-3. Epub 2023 Sep 21.

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes.Bactopia：用于细菌基因组全面分析的灵活流程

mSystems. 2020 Aug 4;5(4):e00190-20. doi: 10.1128/mSystems.00190-20.

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database.GTDB-Tk：一个使用基因组分类数据库对基因组进行分类的工具包。

Bioinformatics. 2019 Nov 15;36(6):1925-7. doi: 10.1093/bioinformatics/btz848.

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.高通量 ANI 分析 9 万余组原核基因组揭示了清晰的物种界限。

Nat Commun. 2018 Nov 30;9(1):5114. doi: 10.1038/s41467-018-07641-9.

A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life.基于基因组系统发育的标准化细菌分类学极大地改变了生命之树。

Nat Biotechnol. 2018 Nov;36(10):996-1004. doi: 10.1038/nbt.4229. Epub 2018 Aug 27.

Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life.近 8000 个宏基因组组装基因组的恢复极大地扩展了生命之树。

Nat Microbiol. 2017 Nov;2(11):1533-1542. doi: 10.1038/s41564-017-0012-7. Epub 2017 Sep 11.

dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication.dRep：一种用于快速准确基因组比较的工具，可通过去重复从宏基因组中实现更好的基因组恢复。

ISME J. 2017 Dec;11(12):2864-2868. doi: 10.1038/ismej.2017.126. Epub 2017 Jul 25.

Nextflow enables reproducible computational workflows.Nextflow支持可重复的计算工作流程。

Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PyOrthoANI、PyFastANI和Pyskani：一套用于计算平均核苷酸一致性的Python库。

PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献