Scornavacca Celine, Jacox Edwin, Szöllősi Gergely J
ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.
ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.
Bioinformatics. 2015 Mar 15;31(6):841-8. doi: 10.1093/bioinformatics/btu728. Epub 2014 Nov 6.
Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods-generally computationally more efficient-require a prior estimate of parameters and of the statistical support.
Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.
传统上,基因系统发育树仅基于分子序列进行重建;然而,这通常无法提供足够信息来区分统计上等效的关系。为了解决这个问题,最近的几种方法在基因树重建中纳入了物种系统发育的信息,从而显著提高了准确性。尽管概率方法能够估计所有模型参数,但计算成本高昂,而简约方法(通常计算效率更高)则需要对参数和统计支持进行先验估计。
在此,我们提出了使用和解的树估计(TERA)算法,这是一种基于简约、考虑物种树的基因树重建方法,它基于一种评分方案,将复制、转移和丢失成本与序列似然估计相结合。TERA探索了所有可以从基因树样本中合并得到的和解基因树。使用大规模模拟数据集,我们证明TERA在速度更快的同时,能够达到与相应概率方法相同的准确性,并且在准确性和速度方面均优于其他基于简约的方法。在一组来自完整蓝藻基因组的1099个同源基因家族上运行TERA,我们发现纳入物种树的知识可使明显转移事件的数量减少三分之二。