• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LINflow:一种计算流程,它将一种无比对方法与一种基于比对的方法相结合,以加速原核生物基因组相似性矩阵的生成。

LINflow: a computational pipeline that combines an alignment-free with an alignment-based method to accelerate generation of similarity matrices for prokaryotic genomes.

作者信息

Tian Long, Mazloom Reza, Heath Lenwood S, Vinatzer Boris A

机构信息

School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA.

Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.

出版信息

PeerJ. 2021 Mar 24;9:e10906. doi: 10.7717/peerj.10906. eCollection 2021.

DOI:10.7717/peerj.10906
PMID:33828908
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8000461/
Abstract

BACKGROUND

Computing genomic similarity between strains is a prerequisite for genome-based prokaryotic classification and identification. Genomic similarity was first computed as Average Nucleotide Identity (ANI) values based on the alignment of genomic fragments. Since this is computationally expensive, faster and computationally cheaper alignment-free methods have been developed to estimate ANI. However, these methods do not reach the level of accuracy of alignment-based methods.

METHODS

Here we introduce LINflow, a computational pipeline that infers pairwise genomic similarity in a set of genomes. LINflow takes advantage of the speed of the alignment-free sourmash tool to identify the genome in a dataset that is most similar to a query genome and the precision of the alignment-based pyani software to precisely compute ANI between the query genome and the most similar genome identified by sourmash. This is repeated for each new genome that is added to a dataset. The sequentially computed ANI values are stored as Life Identification Numbers (LINs), which are then used to infer all other pairwise ANI values in the set. We tested LINflow on four sets, 484 genomes in total, and compared the needed time and the generated similarity matrices with other tools.

RESULTS

LINflow is up to 150 times faster than pyani and pairwise ANI values generated by LINflow are highly correlated with those computed by pyani. However, because LINflow infers most pairwise ANI values instead of computing them directly, ANI values occasionally depart from the ANI values computed by pyani. In conclusion, LINflow is a fast and memory-efficient pipeline to infer similarity among a large set of prokaryotic genomes. Its ability to quickly add new genome sequences to an already computed similarity matrix makes LINflow particularly useful for projects when new genome sequences need to be regularly added to an existing dataset.

摘要

背景

计算菌株之间的基因组相似性是基于基因组的原核生物分类和鉴定的前提条件。基因组相似性最初是基于基因组片段的比对计算为平均核苷酸同一性(ANI)值。由于这在计算上成本高昂,因此已经开发出了更快且计算成本更低的无比对方法来估计ANI。然而,这些方法未达到基于比对方法的准确性水平。

方法

在此,我们介绍LINflow,这是一种计算流程,可推断一组基因组中的成对基因组相似性。LINflow利用无比对的sourmash工具的速度来识别数据集中与查询基因组最相似的基因组,并利用基于比对的pyani软件的精度来精确计算查询基因组与sourmash识别出的最相似基因组之间的ANI。对于添加到数据集中的每个新基因组重复此操作。顺序计算的ANI值存储为生命识别号(LIN),然后用于推断该集合中所有其他成对的ANI值。我们在总共484个基因组的四个数据集上测试了LINflow,并将所需时间和生成的相似性矩阵与其他工具进行了比较。

结果

LINflow比pyani快150倍,并且LINflow生成的成对ANI值与pyani计算的值高度相关。然而,由于LINflow推断大多数成对的ANI值而不是直接计算它们,因此ANI值偶尔会与pyani计算的ANI值有所不同。总之,LINflow是一种快速且内存高效的流程,可推断大量原核生物基因组之间的相似性。它能够快速将新的基因组序列添加到已计算的相似性矩阵中,这使得LINflow对于需要定期将新的基因组序列添加到现有数据集中的项目特别有用。

相似文献

1
LINflow: a computational pipeline that combines an alignment-free with an alignment-based method to accelerate generation of similarity matrices for prokaryotic genomes.LINflow:一种计算流程,它将一种无比对方法与一种基于比对的方法相结合,以加速原核生物基因组相似性矩阵的生成。
PeerJ. 2021 Mar 24;9:e10906. doi: 10.7717/peerj.10906. eCollection 2021.
2
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.高通量 ANI 分析 9 万余组原核基因组揭示了清晰的物种界限。
Nat Commun. 2018 Nov 30;9(1):5114. doi: 10.1038/s41467-018-07641-9.
3
A large-scale evaluation of algorithms to calculate average nucleotide identity.计算平均核苷酸一致性的算法的大规模评估。
Antonie Van Leeuwenhoek. 2017 Oct;110(10):1281-1286. doi: 10.1007/s10482-017-0844-4. Epub 2017 Feb 15.
4
Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes.为了在原核生物的种系划分中实现平均核苷酸同一性和 16S rRNA 基因序列相似性之间的分类学一致性。
Int J Syst Evol Microbiol. 2014 Feb;64(Pt 2):346-351. doi: 10.1099/ijs.0.059774-0.
5
Uncovering the boundaries of species through large-scale phylogenetic and nucleotide identity analyses.通过大规模的系统发育和核苷酸同源性分析揭示种的界限。
mSystems. 2024 Apr 16;9(4):e0121823. doi: 10.1128/msystems.01218-23. Epub 2024 Mar 26.
6
7
Bioinformatic genome comparisons for taxonomic and phylogenetic assignments using Aeromonas as a test case.以气单胞菌为测试案例,进行用于分类学和系统发育分析的生物信息学基因组比较。
mBio. 2014 Nov 18;5(6):e02136. doi: 10.1128/mBio.02136-14.
8
Shifting the genomic gold standard for the prokaryotic species definition.动摇原核生物种定义的基因组黄金标准。
Proc Natl Acad Sci U S A. 2009 Nov 10;106(45):19126-31. doi: 10.1073/pnas.0906412106. Epub 2009 Oct 23.
9
All ANIs are not created equal: implications for prokaryotic species boundaries and integration of ANIs into polyphasic taxonomy.并非所有的 ANI 都是平等的:对原核生物种界的影响,以及将 ANI 整合到多相分类学中。
Int J Syst Evol Microbiol. 2020 Apr;70(4):2937-2948. doi: 10.1099/ijsem.0.004124. Epub 2020 Apr 3.
10
Global genomic similarity and core genome sequence diversity of the genus as a toolkit to identify closely related bacterial species in complex environments.作为在复杂环境中鉴定密切相关细菌物种的工具,该属的全球基因组相似性和核心基因组序列多样性。
PeerJ. 2019 Jan 14;6:e6233. doi: 10.7717/peerj.6233. eCollection 2019.

引用本文的文献

1
Whole-genome analysis of escherichia coli isolated from captive giant pandas (ailuropoda melanoleuca) at the Dujiangyan base of the China conservation and research center for the giant panda, Sichuan, China.对从中国四川大熊猫繁育研究基地都江堰基地圈养大熊猫分离出的大肠杆菌进行全基因组分析。
Acta Vet Scand. 2025 May 29;67(1):27. doi: 10.1186/s13028-025-00812-x.
2
Genomic delineation and description of species and within-species lineages in the genus .该属物种及种内谱系的基因组划分与描述。
Front Microbiol. 2023 Nov 9;14:1254999. doi: 10.3389/fmicb.2023.1254999. eCollection 2023.
3
Dysgonomonas mossii Strain Shenzhen WH 0221, a New Member of the Genus Isolated from the Blood of a Patient with Diabetic Nephropathy, Exhibits Multiple Antibiotic Resistance.

本文引用的文献

1
LINbase: a web server for genome-based identification of prokaryotes as members of crowdsourced taxa.LINbase:一个基于基因组的原核生物分类鉴定的网络服务器,它是众包分类群的成员。
Nucleic Acids Res. 2020 Jul 2;48(W1):W529-W537. doi: 10.1093/nar/gkaa190.
2
Large-scale sequence comparisons with .与……进行大规模序列比较
F1000Res. 2019 Jul 4;8:1006. doi: 10.12688/f1000research.19675.1. eCollection 2019.
3
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.高通量 ANI 分析 9 万余组原核基因组揭示了清晰的物种界限。
丛毛单胞菌深圳 WH0221 株,一株从糖尿病肾病患者血液中分离到的新种,表现出多重耐药性。
Microbiol Spectr. 2022 Aug 31;10(4):e0238121. doi: 10.1128/spectrum.02381-21. Epub 2022 Aug 1.
4
Meta-analysis of the species complex (RSSC) based on comparative evolutionary genomics and reverse ecology.基于比较进化基因组学和逆向生态学的物种复合体(RSSC)的荟萃分析。
Microb Genom. 2022 Mar;8(3). doi: 10.1099/mgen.0.000791.
Nat Commun. 2018 Nov 30;9(1):5114. doi: 10.1038/s41467-018-07641-9.
4
A fast adaptive algorithm for computing whole-genome homology maps.一种用于计算全基因组同源图谱的快速自适应算法。
Bioinformatics. 2018 Sep 1;34(17):i748-i756. doi: 10.1093/bioinformatics/bty597.
5
The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level.微生物基因组图谱 (MiGA) 服务器:基于全基因组水平的古菌和细菌的分类和基因多样性分析。
Nucleic Acids Res. 2018 Jul 2;46(W1):W282-W288. doi: 10.1093/nar/gky467.
6
A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases.一种将长读段映射到大型参考数据库的快速近似算法。
J Comput Biol. 2018 Jul;25(7):766-779. doi: 10.1089/cmb.2018.0036. Epub 2018 Apr 30.
7
A proposal for a portal to make earth's microbial diversity easily accessible and searchable.建立一个使地球上微生物多样性易于获取和搜索的门户网站的提议。
Antonie Van Leeuwenhoek. 2017 Oct;110(10):1271-1279. doi: 10.1007/s10482-017-0849-z. Epub 2017 Mar 9.
8
A Proposal for a Genome Similarity-Based Taxonomy for Plant-Pathogenic Bacteria that Is Sufficiently Precise to Reflect Phylogeny, Host Range, and Outbreak Affiliation Applied to Pseudomonas syringae sensu lato as a Proof of Concept.基于基因组相似性的植物病原细菌分类学提案,该分类学足够精确,能够反映系统发育、宿主范围以及与暴发的关联,以丁香假单胞菌复合群为例进行概念验证。
Phytopathology. 2017 Jan;107(1):18-28. doi: 10.1094/PHYTO-07-16-0252-R. Epub 2016 Oct 14.
9
Mash: fast genome and metagenome distance estimation using MinHash.Mash:使用MinHash进行快速的基因组和宏基因组距离估计。
Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
10
Similarity-based codes sequentially assigned to ebolavirus genomes are informative of species membership, associated outbreaks, and transmission chains.基于相似性的埃博拉病毒基因组编码可提供有关物种归属、相关暴发和传播链的信息。
Open Forum Infect Dis. 2015 Mar 12;2(1):ofv024. doi: 10.1093/ofid/ofv024. eCollection 2015 Jan.