Zhang J, Nei M
Institute of Molecular Evolutionary Genetics, Pennsylvania State University, Mueller Laboratory, University Park 16802, USA.
J Mol Evol. 1997;44 Suppl 1:S139-46. doi: 10.1007/pl00000067.
Information about protein sequences of ancestral organisms is important for identifying critical amino acid substitutions that have caused the functional change of proteins in evolution. Using computer simulation, we studied the accuracy of ancestral amino acids inferred by two currently available methods (maximum-parsimony [MP] and maximum-likelihood [ML] methods) in addition to a distance method, which was newly developed in this paper. All three methods give reliable inference when the divergence of amino acid sequences is low. When the extent of sequence divergence is high, however, the ML and distance methods give more accurate results than the MP method, particularly when the phylogenetic tree includes long branches. The accuracy of inferred ancestral amino acids does not change very much when a few present-day sequences are added or eliminated. When an incorrect model of amino acid substitution is used for the ML and distance methods, the accuracy decreases, but it is still higher than that for the MP method. When the tree topology used is partially incorrect, the accuracy in the correct part of the tree is virtually unaffected. The posterior probability of inferred ancestral amino acids computed by the ML and distance methods is an unbiased estimate of the true probability when a correct substitution model is used but may become an overestimate when a simpler model is used.
有关原始生物蛋白质序列的信息对于识别在进化过程中导致蛋白质功能变化的关键氨基酸替换非常重要。我们使用计算机模拟,研究了两种当前可用方法(最大简约法[MP]和最大似然法[ML])以及本文新开发的一种距离法推断出的原始氨基酸的准确性。当氨基酸序列的差异较低时,这三种方法都能给出可靠的推断。然而,当序列差异程度较高时,ML法和距离法比MP法能给出更准确的结果,特别是当系统发育树包含长分支时。当添加或去除一些现代序列时,推断出的原始氨基酸的准确性变化不大。当为ML法和距离法使用不正确的氨基酸替换模型时,准确性会降低,但仍高于MP法。当使用的树拓扑结构部分不正确时,树的正确部分的准确性实际上不受影响。当使用正确的替换模型时,通过ML法和距离法计算出的推断原始氨基酸的后验概率是真实概率的无偏估计,但当使用更简单的模型时,可能会高估。