Suppr超能文献

歧义数据对最大似然法和贝叶斯推断得出的系统发育估计的影响。

The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference.

机构信息

Section of Integrative Biology, University of Texas at Austin, 1 University Station C0930, Austin, TX 78712, USA.

出版信息

Syst Biol. 2009 Feb;58(1):130-45. doi: 10.1093/sysbio/syp017. Epub 2009 May 22.

Abstract

Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.

摘要

尽管越来越多的系统发育数据集是不完整的,但模糊数据对系统发育准确性的影响还没有得到很好的理解。我们使用四分类模拟来研究最大似然法(ML)和贝叶斯框架中模糊数据(即缺失字符或空位)的影响。通过以一种消除混杂因素的方式引入模糊数据,我们首次清楚地了解了模糊数据可能误导系统发育分析的一种机制。我们发现,在 ML 和贝叶斯框架中,种间速率变化可以与模糊数据相互作用,从而产生拓扑结构和分支长度的误导性估计。此外,在贝叶斯框架内,分支长度和速率异质性参数的先验概率可以加剧模糊数据的影响,导致强烈误导的二分体后验概率。模糊数据偏差的幅度和方向是模糊字符的数量和分类分布、拓扑结构支持的强度以及模型是否正确指定的函数。本研究的结果对所有依赖于拓扑结构或分支长度的准确估计的分析都有重大影响,包括分歧时间估计、祖先状态重建、基于树的比较方法、速率变化分析、系统发育假设检验和系统地理学分析。

相似文献

引用本文的文献

8
A Guide to Phylogenomic Inference.系统发育基因组推断指南。
Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.

本文引用的文献

1
EXPERIMENTAL MOLECULAR EVOLUTION OF BACTERIOPHAGE T7.噬菌体T7的实验性分子进化
Evolution. 1993 Aug;47(4):993-1007. doi: 10.1111/j.1558-5646.1993.tb02130.x.
3
MaxAlign: maximizing usable data in an alignment.最大比对:在比对中最大化可用数据。
BMC Bioinformatics. 2007 Aug 28;8:312. doi: 10.1186/1471-2105-8-312.
8
The supermatrix approach to systematics.系统发育学的超矩阵方法。
Trends Ecol Evol. 2007 Jan;22(1):34-41. doi: 10.1016/j.tree.2006.10.002. Epub 2006 Oct 13.
9
Is there a star tree paradox?是否存在星树悖论?
Mol Biol Evol. 2006 Oct;23(10):1819-23. doi: 10.1093/molbev/msl059. Epub 2006 Jul 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验