Suppr超能文献

基因树间一致性的贝叶斯估计。

Bayesian estimation of concordance among gene trees.

作者信息

Ané Cécile, Larget Bret, Baum David A, Smith Stacey D, Rokas Antonis

机构信息

Department of Statistics, University of Wisconsin, USA.

出版信息

Mol Biol Evol. 2007 Feb;24(2):412-26. doi: 10.1093/molbev/msl170. Epub 2006 Nov 9.

Abstract

Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.

摘要

多基因序列数据在阐明重要且有趣的进化过程方面具有巨大潜力,但从这类数据中提取信息的统计方法仍然有限。尽管各种生物学过程可能导致不同基因具有不同的系统发育历史(从而具有不同的树拓扑结构),但我们也可能预期,一组基因中不同拓扑结构的数量与可能的拓扑结构数量相比相对较少。因此,关于一个基因的树拓扑结构的证据应该会影响我们对另一个不同基因的树拓扑结构的推断,但影响程度如何呢?在本文中,我们提出了一种新方法,用于在给定比对分子序列数据的情况下,对一组基因树之间的一致性进行建模和估计。我们的方法引入了一个单参数概率分布来描述基因树之间一致性的先验分布。我们描述了一种新颖的两阶段马尔可夫链蒙特卡罗(MCMC)方法,该方法首先使用标准方法为各个基因获得独立的贝叶斯后验概率分布。然后,这些后验分布被用作第二个MCMC程序的输入,该程序估计基因到树映射(GTM)的后验分布。然后,可以总结GTM的后验分布,以提供每个基因的修正后验概率分布(考虑到一致性),并允许估计任何给定分支为真的抽样基因的比例(全样本一致性因子)。此外,在假设抽样基因是从已知大小的基因组中随机抽取的情况下,我们展示了如何获得对整个基因组中某个分支为真的比例的估计以及可信区间(全基因组一致性因子)。我们在来自8个酵母物种的一组106个基因上演示了该方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验