Université de Lyon; Université Lyon 1; CNRS; INRIA; UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, F-69622 Villeurbanne, France.
BMC Bioinformatics. 2010 Jun 15;11:324. doi: 10.1186/1471-2105-11-324.
To understand the evolutionary role of Lateral Gene Transfer (LGT), accurate methods are needed to identify transferred genes and infer their timing of acquisition. Phylogenetic methods are particularly promising for this purpose, but the reconciliation of a gene tree with a reference (species) tree is computationally hard. In addition, the application of these methods to real data raises the problem of sorting out real and artifactual phylogenetic conflict.
We present Prunier, a new method for phylogenetic detection of LGT based on the search for a maximum statistical agreement forest (MSAF) between a gene tree and a reference tree. The program is flexible as it can use any definition of "agreement" among trees. We evaluate the performance of Prunier and two other programs (EEEP and RIATA-HGT) for their ability to detect transferred genes in realistic simulations where gene trees are reconstructed from sequences. Prunier proposes a single scenario that compares to the other methods in terms of sensitivity, but shows higher specificity. We show that LGT scenarios carry a strong signal about the position of the root of the species tree and could be used to identify the direction of evolutionary time on the species tree. We use Prunier on a biological dataset of 23 universal proteins and discuss their suitability for inferring the tree of life.
The ability of Prunier to take into account branch support in the process of reconciliation allows a gain in complexity, in comparison to EEEP, and in accuracy in comparison to RIATA-HGT. Prunier's greedy algorithm proposes a single scenario of LGT for a gene family, but its quality always compares to the best solutions provided by the other algorithms. When the root position is uncertain in the species tree, Prunier is able to infer a scenario per root at a limited additional computational cost and can easily run on large datasets.Prunier is implemented in C++, using the Bio++ library and the phylogeny program Treefinder. It is available at: http://pbil.univ-lyon1.fr/software/prunier.
为了理解横向基因转移(LGT)的进化作用,需要准确的方法来识别转移基因并推断其获得的时间。系统发育方法在这方面特别有前途,但将基因树与参考(物种)树进行协调在计算上是困难的。此外,将这些方法应用于实际数据会引发区分真实和人为的系统发育冲突的问题。
我们提出了 Prunier,这是一种基于在基因树和参考树之间搜索最大统计一致森林(MSAF)的新的 LGT 系统发育检测方法。该程序具有灵活性,因为它可以使用任何树之间的“一致”定义。我们评估了 Prunier 和其他两个程序(EEEP 和 RIATA-HGT)在从序列重建基因树的现实模拟中检测转移基因的能力。Prunier 提出了一个与其他方法相比在灵敏度方面相当的单一方案,但显示出更高的特异性。我们表明,LGT 方案携带有关物种树根位置的强烈信号,并可用于识别物种树上进化时间的方向。我们在 23 个通用蛋白质的生物数据集上使用 Prunier 并讨论了它们用于推断生命之树的适用性。
与 EEEP 相比,Prunier 在协调过程中能够考虑分支支持的能力增加了复杂性,与 RIATA-HGT 相比提高了准确性。Prunier 的贪婪算法为基因家族提出了一个单一的 LGT 方案,但它的质量始终与其他算法提供的最佳解决方案相当。当物种树中的根位置不确定时,Prunier 能够以有限的额外计算成本推断一个每个根的方案,并且可以轻松地在大型数据集上运行。Prunier 用 C++编写,使用 Bio++ 库和 phylogeny 程序 Treefinder。它可以在以下网址获得:http://pbil.univ-lyon1.fr/software/prunier。