一种使用核苷酸序列数据进行最大似然系统发育推断的遗传算法。

A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.

作者信息

Lewis P O

机构信息

Department of Biology, University of New Mexico, Albuquerque 87131-1091, USA.

出版信息

Mol Biol Evol. 1998 Mar;15(3):277-83. doi: 10.1093/oxfordjournals.molbev.a025924.

DOI:10.1093/oxfordjournals.molbev.a025924

PMID:9501494

Abstract

Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.

摘要

系统发育重建是一个困难的计算问题，因为可能的解决方案数量会随着所包含分类单元的数量增加而增加。例如，仅对于14个分类单元，就有超过七万亿种可能的无根系统发育树。因此，系统发育推断方法通常使用聚类算法（例如邻接法）或启发式搜索策略，以尽量减少评估非最优树所花费的时间。即使是启发式搜索也可能极其缓慢，特别是当使用计算密集型的最优性标准（如最大似然法）时。我在此描述一种不同的启发式搜索方法（使用遗传算法），它可以极大地减少最大似然系统发育推断所需的时间，特别是对于涉及大量分类单元的数据集。遗传算法是对自然选择的模拟，其中个体是对感兴趣问题的编码解决方案。在这里，带标签的系统发育树就是个体，而差异繁殖是通过允许每个个体产生的后代数量与其个体的排名似然得分成比例来实现的。自然选择会提高系统发育树进化种群中的平均似然性，并且遗传算法会一直运行，直到最佳个体的似然性不再随时间提高为止。给出了一个涉及55种绿色植物的rbcL序列数据的例子。这里描述的遗传算法仅需要使用树二分/重连（TBR）分支交换的传统启发式搜索来获得相同最大似然拓扑所需计算量的6%。

相似文献

A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.一种使用核苷酸序列数据进行最大似然系统发育推断的遗传算法。

Mol Biol Evol. 1998 Mar;15(3):277-83. doi: 10.1093/oxfordjournals.molbev.a025924.

Genetic algorithm for large-scale maximum parsimony phylogenetic analysis of proteins.用于蛋白质大规模最大简约系统发育分析的遗传算法。

Biochim Biophys Acta. 2005 Aug 30;1725(1):19-29. doi: 10.1016/j.bbagen.2005.04.027.

Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used.在使用大量序列时，基于最大简约法、最小进化法和最大似然法标准的系统发育推断快速算法的效率。

Mol Biol Evol. 2000 Aug;17(8):1251-8. doi: 10.1093/oxfordjournals.molbev.a026408.

NJML: a hybrid algorithm for the neighbor-joining and maximum-likelihood methods.NJML：一种用于邻接法和最大似然法的混合算法。

Mol Biol Evol. 2000 Sep;17(9):1401-9. doi: 10.1093/oxfordjournals.molbev.a026423.

Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference.最大似然系统发育推断中的遗传算法与并行处理

Mol Biol Evol. 2002 Oct;19(10):1717-26. doi: 10.1093/oxfordjournals.molbev.a003994.

A structural EM algorithm for phylogenetic inference.一种用于系统发育推断的结构化期望最大化算法。

J Comput Biol. 2002;9(2):331-53. doi: 10.1089/10665270252935494.

A heuristic approach of maximum likelihood method for inferring phylogenetic tree and an application to the mammalian SOX-3 origin of the testis-determining gene SRY.一种用于推断系统发育树的最大似然法启发式方法及其在睾丸决定基因SRY的哺乳动物SOX-3起源中的应用。

FEBS Lett. 1999 Dec 10;463(1-2):129-32. doi: 10.1016/s0014-5793(99)01621-x.

Genetic algorithm-based maximum-likelihood analysis for molecular phylogeny.基于遗传算法的分子系统发育最大似然分析。

J Mol Evol. 2001 Oct-Nov;53(4-5):477-84. doi: 10.1007/s002390010238.

Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.基于最大似然法的不完全谱系分选下基于基因树拓扑结构的合并种系树推断。

Evolution. 2012 Mar;66(3):763-775. doi: 10.1111/j.1558-5646.2011.01476.x. Epub 2011 Nov 2.

Accelerated likelihood surface exploration: the likelihood ratchet.加速似然曲面探索：似然棘轮

Syst Biol. 2003 Jun;52(3):368-73. doi: 10.1080/10635150390196993.

引用本文的文献

Characterization of two novel species of the genus reveals the key role of vertical inheritance in the evolution of alginate utilization loci.对该属两个新物种的表征揭示了垂直遗传在藻酸盐利用基因座进化中的关键作用。

Microbiol Spectr. 2025 Aug 5;13(8):e0091725. doi: 10.1128/spectrum.00917-25. Epub 2025 Jul 7.

The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning.树重建游戏：使用强化学习进行系统发育重建。

Mol Biol Evol. 2024 Jun 1;41(6). doi: 10.1093/molbev/msae105.

Inferring language dispersal patterns with velocity field estimation.利用速度场估计推断语言扩散模式。

Nat Commun. 2024 Jan 2;15(1):190. doi: 10.1038/s41467-023-44430-5.

An evolution strategy approach for the balanced minimum evolution problem.一种平衡最小演化问题的演化策略方法。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad660.

Roadmap to the study of gene and protein phylogeny and evolution-A practical guide.基因和蛋白质系统发生与进化研究路线图——实用指南。

PLoS One. 2023 Feb 24;18(2):e0279597. doi: 10.1371/journal.pone.0279597. eCollection 2023.

A LASSO-based approach to sample sites for phylogenetic tree search.基于套索法的系统发育树搜索采样位点选择方法。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i118-i124. doi: 10.1093/bioinformatics/btac252.

PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data.PhyloMissForest：一种带有缺失数据的构建系统发育树的随机森林框架。

BMC Genomics. 2022 May 18;23(1):377. doi: 10.1186/s12864-022-08540-6.

GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution.GATC：一种在进化的复制-转移-丢失模型下构建基因树的遗传算法。

BMC Genomics. 2018 May 9;19(Suppl 2):102. doi: 10.1186/s12864-018-4455-x.

Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies.使用带有再分配和共识分支的 MOEA 推断系统发育。

Int J Mol Sci. 2017 Dec 26;19(1):62. doi: 10.3390/ijms19010062.

IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.IQ-TREE：一种用于估计最大似然系统发育树的快速且有效的随机算法。

Mol Biol Evol. 2015 Jan;32(1):268-74. doi: 10.1093/molbev/msu300. Epub 2014 Nov 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种使用核苷酸序列数据进行最大似然系统发育推断的遗传算法。

A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献