Suppr超能文献

对宏基因组读数的属/门分类的比对准确性进行基准测试。

Benchmarking blast accuracy of genus/phyla classification of metagenomic reads.

作者信息

Essinger Steven D, Rosen Gail L

机构信息

Electrical & Computer Engineering, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19141, USA.

出版信息

Pac Symp Biocomput. 2010:10-20. doi: 10.1142/9789814295291_0003.

Abstract

Metagenomics is the study of environmental samples. Because few tools exist for metagenomic analysis, a natural step has been to utilize the popular homology tool, BLAST, to search for sequence similarity between sample fragments and an administered database. Most biologists use this method today without knowing BLAST's accuracy, especially when a particular taxonomic class is under-represented in the database. The aim of this paper is to benchmark the performance of BLAST for taxonomic classification of metagenomic datasets in a supervised setting; meaning that the database contains microbes of the same class as the 'unknown' query fragments. We examine well- and under-represented genera and phyla in order to study their effect on the accuracy of BLAST. We conclude that on fine-resolution classes, such as genera, the accuracy of BLAST does not degrade very much with under-representation, but in a highly variant class, such as phyla, performance degrades significantly. Our analysis includes five-fold cross validation to substantiate our findings.

摘要

宏基因组学是对环境样本的研究。由于用于宏基因组分析的工具很少,自然而然的一步就是利用流行的同源性工具BLAST,来搜索样本片段与管理数据库之间的序列相似性。如今,大多数生物学家在使用这种方法时并不了解BLAST的准确性,尤其是当特定的分类类别在数据库中代表性不足时。本文的目的是在有监督的环境下,对BLAST在宏基因组数据集分类中的性能进行基准测试;这意味着数据库包含与“未知”查询片段属于同一类别的微生物。我们研究了代表性良好和代表性不足的属和门,以研究它们对BLAST准确性的影响。我们得出结论,在精细分辨率的类别(如属)上,BLAST的准确性不会因代表性不足而大幅下降,但在高度变异的类别(如门)上,性能会显著下降。我们的分析包括五重交叉验证,以证实我们的发现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验