Blaisdell B E
Department of Mathematics, Stanford University, CA 94305.
J Mol Evol. 1991 Jun;32(6):521-8. doi: 10.1007/BF02102654.
A measure of sequence similarity, dt, not requiring prior sequence alignment gave correct results for a variety of computer-generated model sequences without and with gaps for all degrees of substitution, s. Measure d was the squared Euclidean distance between vectors of counts of t-tuplets of characters in the two sequences. In models without gaps and without Needleman-Wunsch alignment, average d was very closely equal to twice average conventional mismatch counts, m. In these models one of each of the conditions on the Jukes-Cantor model was violated in turn: (1) both descendant lineages receive the same number of substitutions, (2) all sites are equally likely to be substituted, (3) all different replacement characters are equally likely to be chosen, and (4) all original characters are equally likely to be substituted. In Jukes-Cantor models with gaps Needleman-Wunsch alignment was necessarily performed, a procedure that generally produced incorrect values of m. For these models average d was found to be very closely equal to twice the average m estimated from the known value of s using the inverted Jukes-Cantor formula.
一种序列相似性度量dt,无需事先进行序列比对,对于各种计算机生成的模型序列,无论有无空位,在所有替换程度s下都能给出正确结果。度量d是两个序列中字符t联体计数向量之间的欧几里得距离平方。在没有空位且没有Needleman-Wunsch比对的模型中,平均d非常接近于平均传统错配计数m的两倍。在这些模型中,Jukes-Cantor模型的每个条件依次被违反:(1) 两个后代谱系接受相同数量的替换;(2) 所有位点被替换的可能性相同;(3) 所有不同的替换字符被选择的可能性相同;(4) 所有原始字符被替换的可能性相同。在有空位的Jukes-Cantor模型中,必须进行Needleman-Wunsch比对,该过程通常会产生错误的m值。对于这些模型,发现平均d非常接近于使用倒置的Jukes-Cantor公式从已知的s值估计的平均m的两倍。