Fristensky B
Nucleic Acids Res. 1986 Jan 10;14(1):597-610. doi: 10.1093/nar/14.1.597.
Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.
通过使用一个列出短寡聚物在其中一个序列中所有位置的表格来寻找与第二个序列的潜在相似性,点阵序列相似性搜索可以大大加快速度。所描述的算法可找到长度分别为M和N的两个序列之间的相似性,每次比较L个残基,效率为L×M×N/(SK),其中S是字母表大小,k是寡聚物的长度。对于核酸,S = 4,使用四核苷酸表的效率为L×M×N/256。该方法的简单性使得可以直接计算在给定搜索参数下预期发现的相似性水平。此外,所需的存储量最小,甚至可以在小型微型计算机上比较大的序列。讨论了关于使用这种搜索的理论考虑因素。