使用后缀数组算法进行快速序列聚类。

Fast sequence clustering using a suffix array algorithm.

作者信息

Malde Ketil, Coward Eivind, Jonassen Inge

机构信息

Department of Informatics, University of Bergen, HIB, N5020 Norway.

出版信息

Bioinformatics. 2003 Jul 1;19(10):1221-6. doi: 10.1093/bioinformatics/btg138.

DOI:10.1093/bioinformatics/btg138

PMID:12835265

Abstract

MOTIVATION

Efficient clustering is important for handling the large amount of available EST sequences. Most contemporary methods are based on some kind of all-against-all comparison, resulting in a quadratic time complexity. A different approach is needed to keep up with the rapid growth of EST data.

RESULTS

A new, fast EST clustering algorithm is presented. Sub-quadratic time complexity is achieved by using an algorithm based on suffix arrays. A prototype implementation has been developed and run on a benchmark data set. The produced clusterings are validated by comparing them to clusterings produced by other methods, and the results are quite promising.