SPANNER：使用相似性轮廓的金字塔匹配进行序列的分类分配。

SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles.

机构信息

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, Nova Scotia, B3H 4R2, Canada.

出版信息

Bioinformatics. 2013 Aug 1;29(15):1858-64. doi: 10.1093/bioinformatics/btt313. Epub 2013 Jun 3.

DOI:10.1093/bioinformatics/btt313

PMID:23732273

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3712219/

Abstract

BACKGROUND

Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e.g. order) and incorrect below that rank (e.g. family and genus). Algorithms like LCA avoid this by varying the predicted taxonomic rank based on matches to a set of taxonomic references. LCA and related approaches can be conservative, especially if best matches are taxonomically widespread because of events such as lateral gene transfer (LGT).

RESULTS

Our extension to LCA called SPANNER (similarity profile annotater) uses the set of best homology matches (the LCA Profile) for a given sequence and compares this profile with a set of profiles inferred from taxonomic reference organisms. SPANNER provides an assignment that is less sensitive to LGT and other confounding phenomena. In a series of trials on real and artificial datasets, SPANNER outperformed LCA-style algorithms in terms of taxonomic precision and outperformed best BLAST at certain levels of taxonomic novelty in the dataset. We identify examples where LCA made an overly conservative prediction, but SPANNER produced a more precise and correct prediction.

CONCLUSIONS

By using profiles of homology matches to represent patterns of genomic similarity that arise because of vertical and lateral inheritance, SPANNER offers an effective compromise between taxonomic assignment based on best BLAST scores, and the conservative approach of LCA and similar approaches.

AVAILABILITY

C++ source code and binaries are freely available at http://kiwi.cs.dal.ca/Software/SPANNER.

CONTACT

beiko@cs.dal.ca

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

背景

基于同源性的分类学分配受到未分配的读取和参考数据库之间差异的阻碍，迫使分类到最接近的（可能不正确的）参考谱系。这种分配可能只到一般的等级（例如，订单），而低于该等级（例如，家族和属）是不正确的。像 LCA 这样的算法通过根据与一组分类参考的匹配来改变预测的分类等级来避免这种情况。LCA 和相关方法可能比较保守，特别是如果最佳匹配在分类上分布广泛，因为横向基因转移（LGT）等事件。

结果

我们对 LCA 的扩展称为 SPANNER（相似性图谱注释器），它使用给定序列的最佳同源匹配集（LCA 图谱），并将该图谱与从分类参考生物推断出的一组图谱进行比较。SPANNER 提供了一种分配，对 LGT 和其他混淆现象的敏感性较低。在一系列真实和人工数据集的试验中，SPANNER 在分类精度方面优于 LCA 风格的算法，并且在数据集的某些分类新颖性水平上优于最佳 BLAST。我们确定了 LCA 做出过度保守预测的例子，但 SPANNER 产生了更精确和正确的预测。

结论

通过使用同源匹配的图谱来表示由于垂直和横向遗传而产生的基因组相似性模式，SPANNER 在基于最佳 BLAST 得分的分类分配和 LCA 及类似方法的保守方法之间提供了有效的折衷。

可用性

C++ 源代码和二进制文件可在 http://kiwi.cs.dal.ca/Software/SPANNER 上免费获得。

联系方式

beiko@cs.dal.ca

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7af2/3712219/1c3f850dfc1b/btt313f1p.jpg

相似文献

SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles.SPANNER：使用相似性轮廓的金字塔匹配进行序列的分类分配。

Bioinformatics. 2013 Aug 1;29(15):1858-64. doi: 10.1093/bioinformatics/btt313. Epub 2013 Jun 3.

LCA*: an entropy-based measure for taxonomic assignment within assembled metagenomes.LCA*：一种用于已组装宏基因组中分类学归属的基于熵的度量方法。

Bioinformatics. 2016 Dec 1;32(23):3535-3542. doi: 10.1093/bioinformatics/btw400. Epub 2016 Aug 11.

Classifying short genomic fragments from novel lineages using composition and homology.基于组成和同源性对新谱系的短基因组片段进行分类。

BMC Bioinformatics. 2011 Aug 9;12:328. doi: 10.1186/1471-2105-12-328.

Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。

Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.

SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.SOrt-ITEMS：基于序列直系同源性的方法，用于改进宏基因组序列的分类学估计。

Bioinformatics. 2009 Jul 15;25(14):1722-30. doi: 10.1093/bioinformatics/btp317. Epub 2009 May 13.

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.一种用于16S rRNA基因序列的贝叶斯分类方法，具有更高的物种水平准确性。

BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks.MTR：使用多种分类等级的聚类对短宏基因组reads 进行分类注释。

Bioinformatics. 2011 Jan 15;27(2):196-203. doi: 10.1093/bioinformatics/btq649. Epub 2010 Dec 1.

Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.Taxator-tk：通过快速近似进化邻域对宏基因组进行精确的分类学归属

Bioinformatics. 2015 Mar 15;31(6):817-24. doi: 10.1093/bioinformatics/btu745. Epub 2014 Nov 10.

Accurate taxonomic assignment of short pyrosequencing reads.对短焦磷酸测序读段进行准确的分类学归属

Pac Symp Biocomput. 2010:3-9. doi: 10.1142/9789814295291_0002.

ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust：基于扩展的图形方法改进蛋白质序列聚类

Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.

引用本文的文献

Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.宏基因组学音乐——应用、分析流程及其相关工具的综述。

Funct Integr Genomics. 2022 Feb;22(1):3-26. doi: 10.1007/s10142-021-00810-y. Epub 2021 Oct 18.

Taxallnomy: an extension of NCBI Taxonomy that produces a hierarchically complete taxonomic tree.分类学：对 NCBI 分类学的扩展，生成一个层次完整的分类树。

BMC Bioinformatics. 2021 Jul 29;22(1):388. doi: 10.1186/s12859-021-04304-3.

Profundae diversitas: the uncharted genetic diversity in a newly studied group of fungal root endophytes.深度多样性：新研究的一组真菌根内生菌中未知的遗传多样性

Mycology. 2015 Jul 24;6(3-4):139-150. doi: 10.1080/21501203.2015.1070213. eCollection 2015.

A clinician's guide to microbiome analysis.临床医生微生物组分析指南。

Nat Rev Gastroenterol Hepatol. 2017 Oct;14(10):585-595. doi: 10.1038/nrgastro.2017.97. Epub 2017 Aug 9.

Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities.使用计算机模拟和体外模拟群落评估鸟枪法宏基因组学序列分类方法

BMC Bioinformatics. 2015 Nov 4;16:363. doi: 10.1186/s12859-015-0788-5.

Metagenome fragment classification based on multiple motif-occurrence profiles.基于多重模体出现谱的宏基因组片段分类。

PeerJ. 2014 Sep 4;2:e559. doi: 10.7717/peerj.559. eCollection 2014.

The Amordad database engine for metagenomics.用于宏基因组学的 Amordad 数据库引擎。

Bioinformatics. 2014 Oct 15;30(20):2949-55. doi: 10.1093/bioinformatics/btu405. Epub 2014 Jun 27.

本文引用的文献

Rapid identification of high-confidence taxonomic assignments for metagenomic data.快速鉴定宏基因组数据的高可信度分类学分配。

Nucleic Acids Res. 2012 Aug;40(14):e111. doi: 10.1093/nar/gks335. Epub 2012 Apr 24.

Classifying short genomic fragments from novel lineages using composition and homology.基于组成和同源性对新谱系的短基因组片段进行分类。

BMC Bioinformatics. 2011 Aug 9;12:328. doi: 10.1186/1471-2105-12-328.

Taxonomic classification of metagenomic shotgun sequences with CARMA3.基于 CARMA3 的宏基因组鸟枪法测序的分类学分类

Nucleic Acids Res. 2011 Aug;39(14):e91. doi: 10.1093/nar/gkr225. Epub 2011 May 17.

Taxonomic metagenome sequence assignment with structured output models.使用结构化输出模型进行分类宏基因组序列分配。

Nat Methods. 2011 Mar;8(3):191-2. doi: 10.1038/nmeth0311-191.

Ab initio gene identification in metagenomic sequences.从头鉴定宏基因组序列中的基因。

Nucleic Acids Res. 2010 Jul;38(12):e132. doi: 10.1093/nar/gkq275. Epub 2010 Apr 19.

Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives.基于组成区分微生物基因组片段：进化和比较基因组学视角。

Genome Biol Evol. 2010 Jan 25;2:117-31. doi: 10.1093/gbe/evq004.

Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.Phymm和PhymmBL：基于插值马尔可夫模型的宏基因组系统发育分类

Nat Methods. 2009 Sep;6(9):673-6. doi: 10.1038/nmeth.1358. Epub 2009 Aug 2.

Bioinformatics. 2009 Jul 15;25(14):1722-30. doi: 10.1093/bioinformatics/btp317. Epub 2009 May 13.

TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.TACOA：使用核化最近邻方法对环境基因组片段进行分类学分类。

BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56.

What's in the mix: phylogenetic classification of metagenome sequence samples.混合样本中的成分：宏基因组序列样本的系统发育分类

Curr Opin Microbiol. 2007 Oct;10(5):499-503. doi: 10.1016/j.mib.2007.08.004. Epub 2007 Oct 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SPANNER：使用相似性轮廓的金字塔匹配进行序列的分类分配。

SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

背景

结果

结论

可用性

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献