Suppr超能文献

比 BLAST 快几个数量级的搜索和聚类。

Search and clustering orders of magnitude faster than BLAST.

机构信息

Tiburon, CA 94920, USA.

出版信息

Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12.

Abstract

MOTIVATION

Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification.

RESULTS

UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.

AVAILABILITY

Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.

摘要

动机

生物序列数据正在迅速积累,这促使我们开发出改进的高通量序列分类方法。

结果

UBLAST 和 USEARCH 是新的算法,它们能够以极高的速度对大型序列数据库进行敏感的局部和全局搜索。在实际应用中,它们通常比 BLAST 快几个数量级,尽管对远距离蛋白质关系的敏感性较低。UCLUST 是一种新的聚类方法,它利用 USEARCH 将序列分配到聚类中。UCLUST 相对于广泛使用的程序 CD-HIT 具有几个优势,包括更高的速度、更低的内存使用、更高的灵敏度、在较低身份下聚类以及对更大数据集的分类。

可用性

二进制文件可在非商业用途上免费获得,网址是 http://www.drive5.com/usearch。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验