Tyson H, Haley B
Comput Methods Programs Biomed. 1985 Oct;21(1):3-10. doi: 10.1016/0169-2607(85)90057-4.
A program to calculate optimum alignment between two sequences, which may be DNA, amino acid or other information, has been written in PASCAL. The Sellers' algorithm for calculating distance between sequences has been modified to reduce its demands on microcomputer memory space by more than half. Gap penalties and mismatch scores are user-adjustable. In 48 K of memory the program aligns sequences up to 170 elements in length; optimum alignment and total distance between a pair of sequences are displayed. The program aligns longer sequences by subdivision of both sequences into corresponding, overlapping sections. Section length and amount of section overlap are user-defined. More importantly, extension of this modification of Sellers' algorithm to align longer sequences, given hardware and compilers/languages capable of using a larger memory space (e.g. 640 K), shows that it is now possible to align, without subdivision, sequences with up to 700 elements each. The increase in computation time for this program with increasing sequence lengths aligned without subdivision is curvilinear, but total times are essentially dependent on hardware/language/compiler combinations. The statistical significance of an alignment is examined with conventional Monte Carlo approaches.
已用PASCAL语言编写了一个程序,用于计算两个序列之间的最佳比对,这两个序列可以是DNA、氨基酸序列或其他信息。对用于计算序列间距离的塞勒斯算法进行了修改,以将其对微型计算机内存空间的需求减少一半以上。空位罚分和错配得分可由用户调整。在48K内存中,该程序可比对长度达170个元素的序列;显示一对序列之间的最佳比对和总距离。该程序通过将两个序列都细分为相应的重叠部分来比对更长的序列。片段长度和片段重叠量由用户定义。更重要的是,将塞勒斯算法的这种修改扩展到比对更长的序列,在具备能够使用更大内存空间(例如640K)的硬件以及编译器/语言的情况下,表明现在可以在不细分的情况下比对每个序列多达700个元素的序列。该程序在不细分的情况下随着比对序列长度增加,计算时间的增加呈曲线关系,但总时间基本上取决于硬件/语言/编译器的组合。用传统的蒙特卡洛方法检验比对的统计学意义。