Mayrose Itay, Graur Dan, Ben-Tal Nir, Pupko Tal
Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel.
Mol Biol Evol. 2004 Sep;21(9):1781-91. doi: 10.1093/molbev/msh194. Epub 2004 Jun 16.
The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and site-specific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and site-specific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-x(L).
氨基酸位点自由变化的程度在很大程度上取决于其结构和功能的重要性。发挥关键作用的氨基酸在进化过程中不太可能发生变化。因此,氨基酸位点的进化速率表明了该位点的保守程度,进而可以评估其在维持蛋白质结构/功能方面的重要性。在使用概率方法进行位点特异性速率推断时,几乎没有其他选择。在本研究中,我们使用模拟来比较最大似然法和贝叶斯范式。我们研究了推断准确性对序列数量、分支长度、速率分布形状和序列长度等参数的依赖性。我们还研究了同时估计分支长度和位点特异性速率的可能性。我们的结果表明,在广泛的条件下,贝叶斯方法优于最大似然法,这表明纳入贝叶斯计算的先验显著提高了性能。我们表明,当分支长度未知时,最好先估计分支长度,然后再估计位点特异性速率。发现该程序优于同时估计分支长度和位点特异性速率。最后,我们阐述了在分析凋亡调节蛋白Bcl-x(L)的位点保守性时最大似然法和贝叶斯方法之间的差异。