Fitch W M
J Mol Biol. 1983 Jan 15;163(2):171-6. doi: 10.1016/0022-2836(83)90002-5.
The comparison of protein or nucleic acid sequences frequently leads to observations whose improbability can be tested only by Monte Carlo techniques that require randomizing the sequences being compared. Two decisions need to be made. One is whether one demands a resulting random sequence to have the properties of the original sequence (a shuffled sequence) or only expects it to have them (a representative sequence). The second decision concerns the properties of the sequence of which two are composition and nearest-neighbor frequencies. It is shown that biased nearest-neighbor frequencies can significantly affect the probability of observing a given result. Methods for producing random sequences according to these decisions are given.
蛋白质或核酸序列的比较常常会得出一些观察结果,其不可能性只能通过蒙特卡罗技术来检验,而蒙特卡罗技术需要对所比较的序列进行随机化处理。需要做出两个决策。一个是是否要求生成的随机序列具有原始序列的属性(一个重排序列),还是只期望它具有这些属性(一个代表性序列)。第二个决策涉及序列的属性,其中两个是组成和紧邻频率。结果表明,有偏的紧邻频率会显著影响观察到给定结果的概率。本文给出了根据这些决策生成随机序列的方法。