Arvestad L, Bruno W J
Theoretical Biology and Biophysics, Los Alamos National Laboratory, NM 87545, USA.
J Mol Evol. 1997 Dec;45(6):696-703. doi: 10.1007/pl00006274.
We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.
我们提出了一种方法,用于估计与给定的成对排列DNA序列集合相对应的最通用可逆替换矩阵。然后,该矩阵可用于计算集合中序列对之间的进化距离。如果只考虑两个序列,我们的方法等同于拉纳韦等人(1984年)的方法。我们方法的主要新颖之处在于结合了来自不同序列对的数据。我们描述了一种针对由已知树相关联的分类单元对的加权方法,该方法可为所有分支产生均匀的权重。我们估计速率矩阵的方法即使在大型数据集上也能快速执行,并且不需要了解序列之间的系统发育关系。在一个灵长类假基因的测试案例中,我们得到的矩阵类似于使用最大似然法获得的矩阵,并且结果表明所得的距离度量比在不太通用的模型中具有更好的线性。