Suppr超能文献

基于最大似然法搜索估计替代率的最优模型的动态编程程序。

Dynamic programming procedure for searching optimal models to estimate substitution rates based on the maximum-likelihood method.

机构信息

National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research Wuhan, Huazhong Agricultural University, Wuhan 430070, China.

出版信息

Proc Natl Acad Sci U S A. 2011 May 10;108(19):7860-5. doi: 10.1073/pnas.1018621108. Epub 2011 Apr 26.

Abstract

The substitution rate in a gene can provide valuable information for understanding its functionality and evolution. A widely used method to estimate substitution rates is the maximum-likelihood method implemented in the CODEML program in the PAML package. A limited number of branch models, chosen based on a priori information or an interest in a particular lineage(s), are tested, whereas a large number of potential models are neglected. A complementary approach is also needed to test all or a large number of possible models to search for the globally optional model(s) of maximum likelihood. However, the computational time for this search even in a small number of sequences becomes impractically long. Thus, it is desirable to explore the most probable spaces to search for the optimal models. Using dynamic programming techniques, we developed a simple computational method for searching the most probable optimal branch-specific models in a practically feasible computational time. We propose three search methods to find the optimal models, which explored O(n) (method 1) to O(n(2)) (method 2 and method 3) models when the given phylogeny has n branches. In addition, we derived a formula to calculate the number of all possible models, revealing the complexity of finding the optimal branch-specific model. We show that in a reanalysis of over 50 previously published studies, the vast majority obtained better models with significantly higher likelihoods than the conventional hypothesis model methods.

摘要

基因的替换率可以为理解其功能和进化提供有价值的信息。一种广泛使用的估计替换率的方法是 PAML 包中的 CODEML 程序中的最大似然法。选择了有限数量的分支模型,这些模型是基于先验信息或对特定谱系的兴趣选择的,而忽略了大量潜在的模型。还需要一种补充方法来测试所有或大量可能的模型,以搜索最大似然的全局可选模型。然而,即使在少量序列中,这种搜索的计算时间也变得非常长。因此,探索最可能的空间以搜索最佳模型是可取的。我们使用动态规划技术,开发了一种简单的计算方法,用于在实际可行的计算时间内搜索最可能的最优分支特定模型。我们提出了三种搜索方法来找到最优模型,当给定的系统发生树有 n 个分支时,这三种方法分别探索了 O(n)(方法 1)到 O(n(2))(方法 2 和方法 3)的模型。此外,我们推导出了一个公式来计算所有可能模型的数量,揭示了找到最优分支特定模型的复杂性。我们表明,在对 50 多个先前发表的研究的重新分析中,绝大多数研究都获得了比传统假设模型方法更好的模型,具有更高的显著似然率。

相似文献

5
Optimization strategies for fast detection of positive selection on phylogenetic trees.系统发育树上正选择快速检测的优化策略。
Bioinformatics. 2014 Apr 15;30(8):1129-1137. doi: 10.1093/bioinformatics/btt760. Epub 2014 Jan 2.
7
Efficient selection of branch-specific models of sequence evolution.高效选择序列进化的分支特异性模型。
Mol Biol Evol. 2012 Jul;29(7):1861-74. doi: 10.1093/molbev/mss059. Epub 2012 Feb 2.
8
LMAP: Lightweight Multigene Analyses in PAML.LMAP:PAML中的轻量级多基因分析
BMC Bioinformatics. 2016 Sep 6;17(1):354. doi: 10.1186/s12859-016-1204-5.

引用本文的文献

5
Comparative analysis of genomes data.基因组数据的比较分析。
Data Brief. 2021 Dec 2;39:107663. doi: 10.1016/j.dib.2021.107663. eCollection 2021 Dec.

本文引用的文献

6
In defense of statistical methods for detecting positive selection.为检测正选择的统计方法辩护。
Proc Natl Acad Sci U S A. 2009 Sep 8;106(36):E95; author reply E96. doi: 10.1073/pnas.0904550106. Epub 2009 Aug 31.
10
The evolution of color vision in nocturnal mammals.夜行性哺乳动物的色觉进化。
Proc Natl Acad Sci U S A. 2009 Jun 2;106(22):8980-5. doi: 10.1073/pnas.0813201106. Epub 2009 May 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验