Thorne J L, Churchill G A
Biometrics Unit, Cornell University, Ithaca, New York 14853, USA.
Biometrics. 1995 Mar;51(1):100-13.
The problem of estimating the relatedness of a pair of biological sequences is addressed. A stochastic model of sequence evolution is described that allows insertion and deletion as well as replacement of amino acid residues (or substitution of nucleotides) over time. An expectation-maximization (EM) algorithm that obtains maximum likelihood estimates of the model parameters is introduced. The method assumes that the sequences are related by descent from a common ancestor but the alignment (i.e., the precise evolutionary correspondence between residues in each sequence) is unknown. Results from the E-step of the EM algorithm are used to assess the likelihood that any two residues are related by direct descent from a common ancestor.
本文探讨了估计一对生物序列相关性的问题。描述了一种序列进化的随机模型,该模型允许随着时间的推移进行插入、删除以及氨基酸残基的替换(或核苷酸的替代)。引入了一种期望最大化(EM)算法,用于获得模型参数的最大似然估计。该方法假设序列通过共同祖先的遗传而相关,但比对(即每个序列中残基之间精确的进化对应关系)是未知的。EM算法的E步结果用于评估任意两个残基通过共同祖先的直接遗传而相关的可能性。