• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于系统发育推断的结构化期望最大化算法。

A structural EM algorithm for phylogenetic inference.

作者信息

Friedman Nir, Ninio Matan, Pe'er Itsik, Pupko Tal

机构信息

School of Computer Science and Engineering, Hebrew University, Jerusalem, 91904, Israel.

出版信息

J Comput Biol. 2002;9(2):331-53. doi: 10.1089/10665270252935494.

DOI:10.1089/10665270252935494
PMID:12015885
Abstract

A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.

摘要

分子进化研究中的一个核心任务是根据当今生物分类群的序列重建系统发育树。最成熟的树重建方法是最大似然(ML)分析。不幸的是,对于大数据集而言,搜索最大似然系统发育树在计算上是难以实现的。在本文中,我们描述了一种新算法,该算法使用结构期望最大化(EM)来学习最大似然系统发育树。此算法与用于边长度估计的标准EM方法类似,不同之处在于在结构EM算法的迭代过程中,拓扑结构和边长度都会得到改进。我们的算法执行两个步骤的迭代。在E步骤中,我们使用当前的树拓扑结构和边长度来计算期望充分统计量,这些统计量总结了数据。在M步骤中,我们搜索一个相对于这些期望充分统计量使似然性最大化的拓扑结构。我们表明,与标准的拓扑搜索方法不同,在M步骤中搜索更好的拓扑结构可以高效完成。我们证明该过程的每次迭代都会增加拓扑结构的似然性,因此该过程必定收敛。然而,这个收敛点可能是次优的。为了摆脱这种“局部最优”,我们通过纳入模拟退火风格的移动来进一步增强我们的基本EM过程。我们在合成序列数据和真实序列数据上评估了这些新算法,结果表明对于蛋白质序列,即使是我们的基本算法也能找到比现有最大似然系统发育搜索方法更合理的树。此外,我们的算法比这些方法快得多,首次能够在最大似然框架下对大型蛋白质数据集进行系统发育分析。

相似文献

1
A structural EM algorithm for phylogenetic inference.一种用于系统发育推断的结构化期望最大化算法。
J Comput Biol. 2002;9(2):331-53. doi: 10.1089/10665270252935494.
2
Stochastic search strategy for estimation of maximum likelihood phylogenetic trees.用于估计最大似然系统发育树的随机搜索策略。
Syst Biol. 2001 Feb;50(1):7-17.
3
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
4
A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.一种使用核苷酸序列数据进行最大似然系统发育推断的遗传算法。
Mol Biol Evol. 1998 Mar;15(3):277-83. doi: 10.1093/oxfordjournals.molbev.a025924.
5
Hessian calculation for phylogenetic likelihood based on the pruning algorithm and its applications.基于剪枝算法的系统发育似然性的黑塞矩阵计算及其应用。
Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article 14. doi: 10.1515/1544-6115.1779.
6
Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.基于最大似然法的不完全谱系分选下基于基因树拓扑结构的合并种系树推断。
Evolution. 2012 Mar;66(3):763-775. doi: 10.1111/j.1558-5646.2011.01476.x. Epub 2011 Nov 2.
7
DPRml: distributed phylogeny reconstruction by maximum likelihood.DPRml:基于最大似然法的分布式系统发育重建
Bioinformatics. 2005 Apr 1;21(7):969-74. doi: 10.1093/bioinformatics/bti100. Epub 2004 Oct 28.
8
Reconstruction of ancestral genomic sequences using likelihood.使用似然法重建祖先基因组序列。
J Comput Biol. 2007 Mar;14(2):216-37. doi: 10.1089/cmb.2006.0101.
9
EM for phylogenetic topology reconstruction on nonhomogeneous data.EM 算法在非同源数据的系统发育拓扑重建中的应用。
BMC Evol Biol. 2014 Jun 17;14:132. doi: 10.1186/1471-2148-14-132.
10
SuperFine: fast and accurate supertree estimation.SuperFine:快速准确的超级树估计。
Syst Biol. 2012 Mar;61(2):214-27. doi: 10.1093/sysbio/syr092. Epub 2011 Sep 20.

引用本文的文献

1
Osteoclast-expanded supercharged NK cells perform superior antitumour effector functions.破骨细胞扩增的增强型自然杀伤细胞具有卓越的抗肿瘤效应功能。
BMJ Oncol. 2025 Jun 10;4(1):e000676. doi: 10.1136/bmjonc-2024-000676. eCollection 2025.
2
Transcriptomic diversity of innate lymphoid cells in human lymph nodes compared to BM and spleen.与骨髓和脾脏相比,人淋巴结固有淋巴细胞的转录组多样性。
Commun Biol. 2024 Jun 25;7(1):769. doi: 10.1038/s42003-024-06450-9.
3
Maximum Likelihood Inference of Time-scaled Cell Lineage Trees with Mixed-type Missing Data.
具有混合型缺失数据的时间尺度细胞谱系树的最大似然推断
bioRxiv. 2024 Mar 23:2024.03.05.583638. doi: 10.1101/2024.03.05.583638.
4
A topology-marginal composite likelihood via a generalized phylogenetic pruning algorithm.一种通过广义系统发育剪枝算法的拓扑边缘复合似然法。
Algorithms Mol Biol. 2023 Jul 31;18(1):10. doi: 10.1186/s13015-023-00235-1.
5
Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives.预测同义变异的功能效应:系统综述与展望
Front Genet. 2019 Oct 7;10:914. doi: 10.3389/fgene.2019.00914. eCollection 2019.
6
Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF.使用 Phylo-HMRF 比较多个物种的三维基因组组织。
Cell Syst. 2019 Jun 26;8(6):494-505.e14. doi: 10.1016/j.cels.2019.05.011.
7
Maximum Likelihood Estimation of Symmetric Group-Based Models via Numerical Algebraic Geometry.通过数值代数几何对对称群模型进行极大似然估计。
Bull Math Biol. 2019 Feb;81(2):337-360. doi: 10.1007/s11538-018-0523-2. Epub 2018 Oct 24.
8
Continuous-Trait Probabilistic Model for Comparing Multi-species Functional Genomic Data.用于比较多物种功能基因组数据的连续性状概率模型。
Cell Syst. 2018 Aug 22;7(2):208-218.e11. doi: 10.1016/j.cels.2018.05.022. Epub 2018 Jun 20.
9
Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring.使用信息增益和贝叶斯网络评分学习预测性相互作用。
PLoS One. 2015 Dec 1;10(12):e0143247. doi: 10.1371/journal.pone.0143247. eCollection 2015.
10
LEAP: biomarker inference through learning and evaluating association patterns.LEAP:通过学习和评估关联模式进行生物标志物推断。
Genet Epidemiol. 2015 Mar;39(3):173-84. doi: 10.1002/gepi.21889. Epub 2015 Feb 12.