Department of Statistics, University of Georgia, Athens, GA 30602, USA.
BMC Bioinformatics. 2011 Nov 1;12:426. doi: 10.1186/1471-2105-12-426.
A birth and death process is frequently used for modeling the size of a gene family that may vary along the branches of a phylogenetic tree. Under the birth and death model, maximum likelihood methods have been developed to estimate the birth and death rate and the sizes of ancient gene families (numbers of gene copies at the internodes of the phylogenetic tree). This paper aims to provide a Bayesian approach for estimating parameters in the birth and death model.
We develop a Bayesian approach for estimating the birth and death rate and other parameters in the birth and death model. In addition, a Bayesian hypothesis test is developed to identify the gene families that are unlikely under the birth and death process. Simulation results suggest that the Bayesian estimate is more accurate than the maximum likelihood estimate of the birth and death rate. The Bayesian approach was applied to a real dataset of 3517 gene families across genomes of five yeast species. The results indicate that the Bayesian model assuming a constant birth and death rate among branches of the phylogenetic tree cannot adequately explain the observed pattern of the sizes of gene families across species. The yeast dataset was thus analyzed with a Bayesian heterogeneous rate model that allows the birth and death rate to vary among the branches of the tree. The unlikely gene families identified by the Bayesian heterogeneous rate model are different from those given by the maximum likelihood method.
Compared to the maximum likelihood method, the Bayesian approach can produce more accurate estimates of the parameters in the birth and death model. In addition, the Bayesian hypothesis test is able to identify unlikely gene families based on Bayesian posterior p-values. As a powerful statistical technique, the Bayesian approach can effectively extract information from gene family data and thereby provide useful information regarding the evolutionary process of gene families across genomes.
一个出生和死亡过程通常用于模拟一个可能沿着系统发育树分支变化的基因家族的大小。在出生和死亡模型下,已经开发了最大似然方法来估计出生率和死亡率以及古老基因家族的大小(系统发育树分支内的基因拷贝数)。本文旨在为估计出生和死亡模型中的参数提供一种贝叶斯方法。
我们开发了一种用于估计出生和死亡模型中的出生率和死亡率以及其他参数的贝叶斯方法。此外,还开发了一种贝叶斯假设检验来识别在出生和死亡过程下不太可能的基因家族。模拟结果表明,贝叶斯估计比出生率和死亡率的最大似然估计更准确。该贝叶斯方法应用于来自五个酵母物种基因组的 3517 个基因家族的真实数据集。结果表明,在系统发育树的分支之间假设出生率和死亡率恒定的贝叶斯模型不能充分解释跨物种基因家族大小的观察模式。因此,对酵母数据集进行了分析,采用贝叶斯异速率模型,允许出生率和死亡率在树的分支之间变化。贝叶斯异速率模型识别出的不太可能的基因家族与最大似然方法给出的不同。
与最大似然方法相比,贝叶斯方法可以对出生和死亡模型中的参数进行更准确的估计。此外,贝叶斯假设检验能够根据贝叶斯后验概率值识别不太可能的基因家族。作为一种强大的统计技术,贝叶斯方法可以有效地从基因家族数据中提取信息,从而为跨基因组的基因家族进化过程提供有用信息。