贝叶斯系统发育学中的长枝吸引偏差和不一致性。

Long-branch attraction bias and inconsistency in Bayesian phylogenetics.

机构信息

Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, Oregon, United States of America.

出版信息

PLoS One. 2009 Dec 9;4(12):e7891. doi: 10.1371/journal.pone.0007891.

DOI:10.1371/journal.pone.0007891

PMID:20011052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2785476/

Abstract

Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.

摘要

贝叶斯推断（BI）的系统发育关系使用与最大似然法（ML）相同的进化概率模型，因此 BI 通常被认为具有 ML 的理想统计特性，例如在给定准确模型的情况下，拓扑结构的推断基本上没有偏差，并且随着数据量的增加，推断的可靠性越来越高。在这里，我们表明 BI 与 ML 不同，它偏向于将长枝聚在一起的拓扑结构，即使已知真实模型和进化参数在一组系统发育中的先验分布。通过实验模拟研究以及数值和数学分析，我们表明这种偏差随着分析的数据量的增加而变得更加严重，导致 BI 作为最大后验概率系统发育推断出错误的树，随着序列长度接近无穷大，支持率逐渐升高。当真实模型简单时，BI 的长枝吸引偏差相对较弱，但当序列位点异速进化时，偏差变得明显，即使在模型中包含了这种复杂性。这种偏差——在受控模拟条件下和对经验序列数据的分析中都很明显——也使得 BI 比 ML 效率更低，对使用错误进化模型的鲁棒性更差。令人惊讶的是，BI 的偏差是由该方法的一个优点引起的——它通过在可能的分支长度分布上进行积分来包含对分支长度的不确定性，而不像 ML 那样从数据中估计分支长度。我们的研究结果表明，使用 BI 推断的树应该谨慎解释，而 ML 可能是现代系统发育分析更可靠的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/037b/2785476/be6f15cf36dd/pone.0007891.g001.jpg

相似文献

Long-branch attraction bias and inconsistency in Bayesian phylogenetics.贝叶斯系统发育学中的长枝吸引偏差和不一致性。

PLoS One. 2009 Dec 9;4(12):e7891. doi: 10.1371/journal.pone.0007891.

Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation.基于相对分支长度差异和模型违背情况下蛋白质序列数据的贝叶斯和最大似然系统发育分析。

BMC Evol Biol. 2005 Jan 28;5:8. doi: 10.1186/1471-2148-5-8.

Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses.分支长度不确定性对系统发育假设的贝叶斯后验概率的影响。

Mol Biol Evol. 2007 Sep;24(9):2108-18. doi: 10.1093/molbev/msm141. Epub 2007 Jul 17.

The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae.细节中的魔鬼：分支长度先验和似然模型之间的相互作用影响了 Psoraceae 系统发育中的节点支持和分支长度。

Syst Biol. 2011 Jul;60(4):541-61. doi: 10.1093/sysbio/syr022. Epub 2011 Mar 24.

Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous.当进化具有异质性时最大简约法和似然法系统发育分析的性能

Nature. 2004 Oct 21;431(7011):980-4. doi: 10.1038/nature02917.

Long Branch Attraction Biases in Phylogenetics.系统发育学中的长枝吸引偏差。

Syst Biol. 2021 Jun 16;70(4):838-843. doi: 10.1093/sysbio/syab001.

Branch length estimation and divergence dating: estimates of error in Bayesian and maximum likelihood frameworks.支长估计和分歧日期：贝叶斯和最大似然框架中的误差估计。

BMC Evol Biol. 2010 Jan 11;10:5. doi: 10.1186/1471-2148-10-5.

Branch-length prior influences Bayesian posterior probability of phylogeny.分支长度先验影响系统发育的贝叶斯后验概率。

Syst Biol. 2005 Jun;54(3):455-70. doi: 10.1080/10635150590945313.

Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects.以不同进化速率演变的性状在解析具有不等分支长度的四重树时的效用：长分支效应的分析预测

BMC Evol Biol. 2015 May 14;15:86. doi: 10.1186/s12862-015-0364-7.

Molecular phylogenetics of Oestroidea (Diptera: Calyptratae) with emphasis on Calliphoridae: insights into the inter-familial relationships and additional evidence for paraphyly among blowflies.Oestroidea（双翅目：Calyptratae）的分子系统发育学研究，重点关注丽蝇科：种间关系的深入了解以及对丽蝇科系统发育关系的进一步证据。

Mol Phylogenet Evol. 2012 Dec;65(3):840-54. doi: 10.1016/j.ympev.2012.08.007. Epub 2012 Aug 17.

引用本文的文献

Dating in the Dark: Elevated Substitution Rates in Cave Cockroaches (Blattodea: Nocticolidae) Have Negative Impacts on Molecular Date Estimates.黑暗约会：洞穴蜚蠊（蜚蠊目：蜚蠊科）的高替代率对分子日期估计有负面影响。

Syst Biol. 2024 Sep 5;73(3):532-545. doi: 10.1093/sysbio/syae002.

Performance of tree-building methods using a morphological dataset and a well-supported Hexapoda phylogeny.基于形态数据集和支持良好的六足动物系统发育构建树的方法的性能。

PeerJ. 2024 Jan 8;12:e16706. doi: 10.7717/peerj.16706. eCollection 2024.

Comparison of Mitochondrial Genome Sequences between Two Species of the Family Palaemonidae (Decapoda: Caridea): Gene Rearrangement and Phylogenetic Implications.两种对虾科（十足目：对虾科）线粒体基因组序列的比较：基因重排与系统发育意义。

Genes (Basel). 2023 Jul 22;14(7):1499. doi: 10.3390/genes14071499.

Predicting Species Boundaries and Assessing Undescribed Diversity in , an Obligate Lung Symbiont.预测专性肺共生菌——[具体菌名未给出]中的物种界限并评估未描述的多样性。

J Fungi (Basel). 2022 Jul 29;8(8):799. doi: 10.3390/jof8080799.

hlh-12, a gene that is necessary and sufficient to promote migration of gonadal regulatory cells in Caenorhabditis elegans, evolved within the Caenorhabditis clade.hlh-12 基因是线虫属中促进生殖细胞调节细胞迁移所必需和充分的基因。

Genetics. 2021 Nov 5;219(3). doi: 10.1093/genetics/iyab127.

Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work.蛋白质进化过程中氨基酸偏好性的转变：实验与理论工作的综合。

Protein Sci. 2021 Oct;30(10):2009-2028. doi: 10.1002/pro.4161. Epub 2021 Aug 12.

The Phylogeny of Class B Flavoprotein Monooxygenases and the Origin of the YUCCA Protein Family.B类黄素蛋白单加氧酶的系统发育与YUCCA蛋白家族的起源

Plants (Basel). 2020 Aug 25;9(9):1092. doi: 10.3390/plants9091092.

Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.使用深度学习从多重序列比对中准确推断树拓扑结构。

Syst Biol. 2020 Mar 1;69(2):221-233. doi: 10.1093/sysbio/syz060.

Genetic structure in Red Junglefowl () populations: Strong spatial patterns in the wild ancestors of domestic chickens in a core distribution range.原鸡（）种群的遗传结构：家鸡野生祖先在核心分布范围内存在强烈的空间格局。

Ecol Evol. 2018 Jun 11;8(13):6575-6588. doi: 10.1002/ece3.4139. eCollection 2018 Jul.

Discovery of the First Germline-Restricted Gene by Subtractive Transcriptomic Analysis in the Zebra Finch, Taeniopygia guttata.通过斑马雀（Taeniopygia guttata）消减转录组分析发现第一个种系限制基因。

Curr Biol. 2018 May 21;28(10):1620-1627.e5. doi: 10.1016/j.cub.2018.03.067. Epub 2018 May 3.

本文引用的文献

A review of long-branch attraction.长枝吸引现象综述。

Cladistics. 2005 Apr;21(2):163-193. doi: 10.1111/j.1096-0031.2005.00059.x.

Progress with methods for constructing evolutionary trees.进化树构建方法的进展。

Trends Ecol Evol. 1992 Mar;7(3):73-9. doi: 10.1016/0169-5347(92)90244-6.

On the distributions of bootstrap support and posterior distributions for a star tree.关于星树的自举支持分布和后验分布。

Syst Biol. 2008 Aug;57(4):602-12. doi: 10.1080/10635150802302468.

A mixed branch length model of heterotachy improves phylogenetic accuracy.一种异速进化的混合分支长度模型提高了系统发育准确性。

Mol Biol Evol. 2008 Jun;25(6):1054-66. doi: 10.1093/molbev/msn042. Epub 2008 Mar 3.

Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses.分支长度不确定性对系统发育假设的贝叶斯后验概率的影响。

Mol Biol Evol. 2007 Sep;24(9):2108-18. doi: 10.1093/molbev/msm141. Epub 2007 Jul 17.

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.RAxML-VI-HPC：基于最大似然法的系统发育分析，适用于数千个分类单元及混合模型。

Bioinformatics. 2006 Nov 1;22(21):2688-90. doi: 10.1093/bioinformatics/btl446. Epub 2006 Aug 23.

Is there a star tree paradox?是否存在星树悖论？

Mol Biol Evol. 2006 Oct;23(10):1819-23. doi: 10.1093/molbev/msl059. Epub 2006 Jul 12.

Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative.分支的近似似然比检验：一种快速、准确且强大的替代方法。

Syst Biol. 2006 Aug;55(4):539-52. doi: 10.1080/10635150600755453.

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

Phylogenetic MCMC algorithms are misleading on mixtures of trees.系统发育马尔可夫链蒙特卡罗算法在树的混合模型上具有误导性。

Science. 2005 Sep 30;309(5744):2207-9. doi: 10.1126/science.1115493.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

贝叶斯系统发育学中的长枝吸引偏差和不一致性。

Long-branch attraction bias and inconsistency in Bayesian phylogenetics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献