Ruzzo W L, Tompa M
Department of Computer Science and Engineering, University of Washington, Seattle 98195-2350, USA.
Proc Int Conf Intell Syst Mol Biol. 1999:234-41.
Given a sequence of real numbers ("scores"), we present a practical linear time algorithm to find those nonoverlapping, contiguous subsequences having greatest total scores. This improves on the best previously known algorithm, which requires quadratic time in the worst case. The problem arises in biological sequence analysis, where the high-scoring subsequences correspond to regions of unusual composition in a nucleic acid or protein sequence. For instance, Altschul, Karlin, and others have used this approach to identify transmembrane regions, DNA binding domains, and regions of high charge in proteins.
给定一个实数序列(“分数”),我们提出一种实用的线性时间算法,以找到那些具有最大总分的非重叠连续子序列。这改进了之前已知的最佳算法,该算法在最坏情况下需要二次时间。这个问题出现在生物序列分析中,其中高分的子序列对应于核酸或蛋白质序列中组成异常的区域。例如,阿尔茨舒尔、卡林等人已经使用这种方法来识别跨膜区域、DNA结合结构域以及蛋白质中的高电荷区域。