Tiburon, CA 94920, USA.
Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12.
MOTIVATION: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. RESULTS: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. AVAILABILITY: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
动机:生物序列数据正在迅速积累,这促使我们开发出改进的高通量序列分类方法。
结果:UBLAST 和 USEARCH 是新的算法,它们能够以极高的速度对大型序列数据库进行敏感的局部和全局搜索。在实际应用中,它们通常比 BLAST 快几个数量级,尽管对远距离蛋白质关系的敏感性较低。UCLUST 是一种新的聚类方法,它利用 USEARCH 将序列分配到聚类中。UCLUST 相对于广泛使用的程序 CD-HIT 具有几个优势,包括更高的速度、更低的内存使用、更高的灵敏度、在较低身份下聚类以及对更大数据集的分类。
可用性:二进制文件可在非商业用途上免费获得,网址是 http://www.drive5.com/usearch。
Bioinformatics. 2010-8-12
BMC Bioinformatics. 2013-8-15
Bioinformatics. 2016-5-1
Bioinformatics. 2005-5-1
BMC Bioinformatics. 2015-7-10
Bioinformatics. 2010-11-18
Bioinformatics. 2009-10-1
BMJ Open Gastroenterol. 2025-9-5
Front Microbiol. 2025-8-12
Microorganisms. 2025-8-18