Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Bioinformatics. 2012 Jun 15;28(12):i283-91. doi: 10.1093/bioinformatics/bts225.
Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction.
We present two new algorithms for the DTL reconciliation problem that are dramatically faster than existing algorithms, both asymptotically and in practice. We also extend the standard DTL reconciliation model by considering distance-dependent transfer costs, which allow for more accurate reconciliation and give an efficient algorithm for DTL reconciliation under this extended model. We implemented our new algorithms and demonstrated up to 100 000-fold speed-up over existing methods, using both simulated and biological datasets. This dramatic improvement makes it possible to use DTL reconciliation for performing rigorous evolutionary analyses of large gene families and enables its use in advanced reconciliation-based gene and species tree reconstruction methods.
Our programs can be freely downloaded from http://compbio.mit.edu/ranger-dtl/.
基因家族的进化是由进化事件驱动的,如物种形成、基因复制、水平基因转移和基因丢失,推断这些事件在给定基因家族的进化历史中是比较和进化基因组学的一个基本问题,具有许多重要的应用。解决这个问题需要使用调和框架,其中输入包括基因家族系统发育和相应的物种系统发育,目标是通过假设物种形成、基因复制、水平基因转移和基因丢失事件来调和这两者。这个调和问题被称为复制-转移-丢失(DTL)调和,在文献中已经得到了广泛的研究。然而,即使是现有用于 DTL 调和的最快算法,对于调和大型基因家族和用于更复杂的应用,如基因树或物种树重建,也太慢了。
我们提出了两种新的 DTL 调和问题算法,它们在渐近和实践上都比现有的算法快得多。我们还通过考虑距离依赖的转移成本来扩展标准的 DTL 调和模型,这允许更准确的调和,并为这个扩展模型下的 DTL 调和提供了一个有效的算法。我们实现了我们的新算法,并在使用模拟和生物数据集时,与现有方法相比,实现了高达 100000 倍的加速。这种显著的改进使得 DTL 调和能够用于对大型基因家族进行严格的进化分析,并使其能够用于基于调和的高级基因和物种树重建方法。
我们的程序可以从 http://compbio.mit.edu/ranger-dtl/ 免费下载。