National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research Wuhan, Huazhong Agricultural University, Wuhan 430070, China.
Proc Natl Acad Sci U S A. 2011 May 10;108(19):7860-5. doi: 10.1073/pnas.1018621108. Epub 2011 Apr 26.
The substitution rate in a gene can provide valuable information for understanding its functionality and evolution. A widely used method to estimate substitution rates is the maximum-likelihood method implemented in the CODEML program in the PAML package. A limited number of branch models, chosen based on a priori information or an interest in a particular lineage(s), are tested, whereas a large number of potential models are neglected. A complementary approach is also needed to test all or a large number of possible models to search for the globally optional model(s) of maximum likelihood. However, the computational time for this search even in a small number of sequences becomes impractically long. Thus, it is desirable to explore the most probable spaces to search for the optimal models. Using dynamic programming techniques, we developed a simple computational method for searching the most probable optimal branch-specific models in a practically feasible computational time. We propose three search methods to find the optimal models, which explored O(n) (method 1) to O(n(2)) (method 2 and method 3) models when the given phylogeny has n branches. In addition, we derived a formula to calculate the number of all possible models, revealing the complexity of finding the optimal branch-specific model. We show that in a reanalysis of over 50 previously published studies, the vast majority obtained better models with significantly higher likelihoods than the conventional hypothesis model methods.
基因的替换率可以为理解其功能和进化提供有价值的信息。一种广泛使用的估计替换率的方法是 PAML 包中的 CODEML 程序中的最大似然法。选择了有限数量的分支模型,这些模型是基于先验信息或对特定谱系的兴趣选择的,而忽略了大量潜在的模型。还需要一种补充方法来测试所有或大量可能的模型,以搜索最大似然的全局可选模型。然而,即使在少量序列中,这种搜索的计算时间也变得非常长。因此,探索最可能的空间以搜索最佳模型是可取的。我们使用动态规划技术,开发了一种简单的计算方法,用于在实际可行的计算时间内搜索最可能的最优分支特定模型。我们提出了三种搜索方法来找到最优模型,当给定的系统发生树有 n 个分支时,这三种方法分别探索了 O(n)(方法 1)到 O(n(2))(方法 2 和方法 3)的模型。此外,我们推导出了一个公式来计算所有可能模型的数量,揭示了找到最优分支特定模型的复杂性。我们表明,在对 50 多个先前发表的研究的重新分析中,绝大多数研究都获得了比传统假设模型方法更好的模型,具有更高的显著似然率。