Martinez H M
Nucleic Acids Res. 1983 Jul 11;11(13):4629-34. doi: 10.1093/nar/11.13.4629.
The problem of finding repeats in molecular sequences is approached as a sorting problem. It leads to a method which is linear in space complexity and NlogN in expected time complexity. The implementation is straightforward and can therefore be used to handle large sequences with relative ease. Of particular interest is that several sequences can be treated as a single sequence. This leads to an efficient method for finding dyads and for finding common features of many sequences, such as favorable alignments.
寻找分子序列中的重复序列问题被当作一个排序问题来处理。这引出了一种空间复杂度为线性且期望时间复杂度为NlogN的方法。该实现很直接,因此可相对轻松地用于处理大型序列。特别值得关注的是,多个序列可被当作单个序列来处理。这产生了一种用于寻找二元组以及寻找多个序列共同特征(如良好比对)的有效方法。