Suppr超能文献

贝叶斯分析中的数据划分与复杂模型:睑虎科蜥蜴的系统发育

Data partitions and complex models in Bayesian analysis: the phylogeny of Gymnophthalmid lizards.

作者信息

Castoe Todd A, Doan Tiffany M, Parkinson Christopher L

机构信息

Department of Biology, University of Central Florida, 4000 Central Florida Boulevard, Orlando, FL 32816-2368, USA.

出版信息

Syst Biol. 2004 Jun;53(3):448-69. doi: 10.1080/10635150490445797.

Abstract

Phylogenetic studies incorporating multiple loci, and multiple genomes, are becoming increasingly common. Coincident with this trend in genetic sampling, model-based likelihood techniques including Bayesian phylogenetic methods continue to gain popularity. Few studies, however, have examined model fit and sensitivity to such potentially heterogeneous data partitions within combined data analyses using empirical data. Here we investigate the relative model fit and sensitivity of Bayesian phylogenetic methods when alternative site-specific partitions of among-site rate variation (with and without autocorrelated rates) are considered. Our primary goal in choosing a best-fit model was to employ the simplest model that was a good fit to the data while optimizing topology and/or Bayesian posterior probabilities. Thus, we were not interested in complex models that did not practically affect our interpretation of the topology under study. We applied these alternative models to a four-gene data set including one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two mitochondrial rRNA genes (12S and 16S) for the diverse yet poorly known lizard family Gymnophthalmidae. Our results suggest that the best-fit model partitioned among-site rate variation separately among the c-mos, ND4, and 12S + 16S gene regions. We found this model yielded identical topologies to those from analyses based on the GTR+I+G model, but significantly changed posterior probability estimates of clade support. This partitioned model also produced more precise (less variable) estimates of posterior probabilities across generations of long Bayesian runs, compared to runs employing a GTR+I+G model estimated for the combined data. We use this three-way gamma partitioning in Bayesian analyses to reconstruct a robust phylogenetic hypothesis for the relationships of genera within the lizard family Gymnophthalmidae. We then reevaluate the higher-level taxonomic arrangement of the Gymnophthalmidae. Based on our findings, we discuss the utility of nontraditional parameters for modeling among-site rate variation and the implications and future directions for complex model building and testing.

摘要

纳入多个基因座和多个基因组的系统发育研究正变得越来越普遍。与这种基因采样趋势相一致的是,包括贝叶斯系统发育方法在内的基于模型的似然技术越来越受欢迎。然而,很少有研究在使用经验数据的组合数据分析中,检验模型对这种潜在异质数据分区的拟合度和敏感性。在这里,我们研究了在考虑位点间速率变化的替代位点特异性分区(有和没有自相关速率)时,贝叶斯系统发育方法的相对模型拟合度和敏感性。我们选择最佳拟合模型的主要目标是采用最适合数据的最简单模型,同时优化拓扑结构和/或贝叶斯后验概率。因此,我们对那些实际上不会影响我们对所研究拓扑结构解释的复杂模型不感兴趣。我们将这些替代模型应用于一个四基因数据集,该数据集包括一个蛋白质编码核基因(c-mos)、一个蛋白质编码线粒体基因(ND4)以及两个线粒体rRNA基因(12S和16S),用于研究多样但鲜为人知的蜥蜴科裸眼蜥科。我们的结果表明,最佳拟合模型在位点间速率变化方面,分别在c-mos、ND4和12S + 16S基因区域进行了分区。我们发现这个模型产生的拓扑结构与基于GTR+I+G模型的分析结果相同,但显著改变了分支支持的后验概率估计。与对组合数据估计的GTR+I+G模型运行相比,这个分区模型在长时间贝叶斯运行的各代中,也产生了更精确(变化更小)的后验概率估计。我们在贝叶斯分析中使用这种三向伽马分区,来重建裸眼蜥科蜥蜴属间关系的稳健系统发育假设。然后我们重新评估裸眼蜥科的高级分类安排。基于我们的发现,我们讨论了用于位点间速率变化建模的非传统参数的效用,以及复杂模型构建和测试的意义和未来方向。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验