LIRMM, Montpellier.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):26-39. doi: 10.1109/TCBB.2011.64. Epub 2011 Mar 30.
Inferring an evolutionary scenario for a gene family is a fundamental problem with applications both in functional and evolutionary genomics. The gene tree/species tree reconciliation approach has been widely used to address this problem, but mostly in a discrete parsimony framework that aims at minimizing the number of gene duplications and/or gene losses. Recently, a probabilistic approach has been developed, based on the classical birth-and-death process, including efficient algorithms for computing posterior probabilities of reconciliations and orthology prediction.
In previous work, we described an algorithm for exploring the whole space of gene tree/species tree reconciliations, that we adapt here to compute efficiently the posterior probability of such reconciliations. These posterior probabilities can be either computed exactly or approximated, depending on the reconciliation space size. We use this algorithm to analyze the probabilistic landscape of the space of reconciliations for a real data set of fungal gene families and several data sets of synthetic gene trees.
The results of our simulations suggest that, with exact gene trees obtained by a simple birth-and-death process and realistic gene duplication/loss rates, a very small subset of all reconciliations needs to be explored in order to approximate very closely the posterior probability of the most likely reconciliations. For cases where the posterior probability mass is more evenly dispersed, our method allows to explore efficiently the required subspace of reconciliations.
推断基因家族的进化场景是功能和进化基因组学中一个基本问题。基因树/物种树协调方法已被广泛用于解决这个问题,但主要是在离散简约框架中,旨在最小化基因重复和/或基因丢失的数量。最近,基于经典的出生-死亡过程,开发了一种概率方法,包括计算协调和同源预测后验概率的有效算法。
在以前的工作中,我们描述了一种探索基因树/物种树协调的整个空间的算法,我们在这里对其进行了适应性修改,以便有效地计算这种协调的后验概率。这些后验概率可以通过精确计算或近似计算来获得,具体取决于协调空间的大小。我们使用该算法分析了真菌基因家族的真实数据集和几个合成基因树数据集的协调空间的概率景观。
我们的模拟结果表明,对于通过简单的出生-死亡过程获得的精确基因树和现实的基因复制/丢失率,只需探索所有协调的一小部分子集,就可以非常接近地近似最可能的协调的后验概率。对于后验概率质量分布更均匀的情况,我们的方法可以有效地探索所需的协调子空间。