Suppr超能文献

系统发育树最大似然法的上限

Upper bounds on maximum likelihood for phylogenetic trees.

作者信息

Hendy Michael D, Holland Barbara R

机构信息

Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.

出版信息

Bioinformatics. 2003 Oct;19 Suppl 2:ii66-72. doi: 10.1093/bioinformatics/btg1062.

Abstract

We introduce a mechanism for analytically deriving upper bounds on the maximum likelihood for genetic sequence data on sets of phylogenies. A simple 'partition' bound is introduced for general models. Tighter bounds are developed for the simplest model of evolution, the two state symmetric model of nucleotide substitution under the molecular clock. This follows earlier theoretical work which has been restricted to this model by analytic complexity. A weakness of current numerical computation is that reported 'maximum likelihood' results cannot be guaranteed, both for a specified tree (because of the possibility of multiple maxima) or over the full tree space (as the computation is intractable for large sets of trees). The bounds we develop here can be used to conclusively eliminate large proportions of tree space in the search for the maximum likelihood tree. This is vital in the development of a branch and bound search strategy for identifying the maximum likelihood tree. We report the results from a simulation study of approximately 10(6) data sets generated on clock-like trees of five leaves. In each trial a likelihood value of one specific instance of a parameterised tree is compared to the bound determined for each of the 105 possible rooted binary trees. The proportion of trees that are eliminated from the search for the maximum likelihood tree ranged from 92% to almost 98%, indicating a computational speed-up factor of between 12 and 44.

摘要

我们引入了一种机制,用于分析推导系统发育树上遗传序列数据最大似然的上界。针对一般模型引入了一个简单的“划分”界。针对最简单的进化模型,即分子钟下核苷酸替换的两状态对称模型,开发了更严格的界。这是在早期理论工作的基础上进行的,由于分析复杂性,早期工作仅限于此模型。当前数值计算的一个弱点是,对于指定的树(由于可能存在多个最大值)或在整个树空间中(因为对于大量树集的计算是难以处理的),所报告的“最大似然”结果无法得到保证。我们在此开发的界可用于在寻找最大似然树的过程中决定性地排除大部分树空间。这对于开发用于识别最大似然树的分支定界搜索策略至关重要。我们报告了一项模拟研究的结果,该研究在具有五片叶子的类时钟树上生成了大约10⁶个数据集。在每次试验中,将参数化树的一个特定实例的似然值与为105种可能的有根二叉树中的每一种确定的界进行比较。从寻找最大似然树的搜索中排除的树的比例范围从92%到几乎98%,表明计算加速因子在12到44之间。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验