使用改进的后缀树进行重复序列识别。

Repeats identification using improved suffix trees.

作者信息

Huo Hongwei, Wang Xiaowu, Stojkovic Vojislav

机构信息

School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China.

出版信息

Int J Comput Biol Drug Des. 2009;2(3):264-77. doi: 10.1504/IJCBDD.2009.030117. Epub 2009 Dec 10.

DOI:10.1504/IJCBDD.2009.030117

PMID:20090164

Abstract

The suffix tree data structure plays an important role in the efficient implementations of some querying algorithms. This paper presents the fast Rep(eats)Seeker algorithm for repeats identification based on the improvements of suffix tree construction. The leaf nodes and the branch nodes are numbered in different ways during the construction of a suffix tree and extra information is added to the branch nodes. The experimental results show that improvements reduce the running time of the RepSeeker algorithm without losing the accuracy. The experimental results coincide with the theoretical expectations.

摘要

后缀树数据结构在一些查询算法的高效实现中起着重要作用。本文基于后缀树构建的改进，提出了用于重复序列识别的快速Rep(eats)Seeker算法。在后缀树构建过程中，叶节点和分支节点采用不同的编号方式，并在分支节点中添加额外信息。实验结果表明，这些改进在不损失准确性的情况下减少了RepSeeker算法的运行时间。实验结果与理论预期相符。