Müller Tobias, Spang Rainer, Vingron Martin
Deutsches Krebsforschungszentrum, Theoretische Bioinformatik, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
Mol Biol Evol. 2002 Jan;19(1):8-13. doi: 10.1093/oxfordjournals.molbev.a003985.
Evolution of proteins is generally modeled as a Markov process acting on each site of the sequence. Replacement frequencies need to be estimated based on sequence alignments. Here we compare three approaches: First, the original method by Dayhoff, Schwartz, and Orcutt (1978) Atlas Protein Seq. Struc. 5:345-352, secondly, the resolvent method (RV) by Müller and Vingron (2000) J. Comput. Biol. 7(6):761-776, and finally a maximum likelihood approach (ML) developed in this paper. We evaluate the methods using a highly divergent and inhomogeneous set of sequence alignments as an input to the estimation procedure. ML is the method of choice for small sets of input data. Although the RV method is computationally much less demanding it performs only slightly worse than ML. Therefore, it is perfectly appropriate for large-scale applications.
蛋白质的进化通常被建模为作用于序列每个位点的马尔可夫过程。需要根据序列比对来估计替换频率。在此我们比较三种方法:第一,Dayhoff、Schwartz和Orcutt(1978年,《蛋白质序列结构图谱》第5卷:345 - 352页)提出的原始方法;第二,Müller和Vingron(2000年,《计算生物学杂志》第7卷第6期:761 - 776页)提出的预解式方法(RV);最后是本文开发的最大似然方法(ML)。我们使用一组高度发散且不均匀的序列比对作为估计程序的输入来评估这些方法。对于少量输入数据,ML是首选方法。尽管RV方法在计算上要求低得多,但其性能仅比ML略差。因此,它非常适合大规模应用。