Paszek Jarosław, Górecki Paweł
University of Warsaw, Institute of Informatics, Banacha 2, Warsaw, 02-097, Poland.
BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):15. doi: 10.1186/s12864-015-2308-4.
Discovering the location of gene duplications and multiple gene duplication episodes is a fundamental issue in evolutionary molecular biology. The problem introduced by Guigó et al. in 1996 is to map gene duplication events from a collection of rooted, binary gene family trees onto theirs corresponding rooted binary species tree in such a way that the total number of multiple gene duplication episodes is minimized. There are several models in the literature that specify how gene duplications from gene families can be interpreted as one duplication episode. However, in all duplication episode problems gene trees are rooted. This restriction limits the applicability, since unrooted gene family trees are frequently inferred by phylogenetic methods.
In this article we show the first solution to the open problem of episode clustering where the input gene family trees are unrooted. In particular, by using theoretical properties of unrooted reconciliation, we show an efficient algorithm that reduces this problem into the episode clustering problems defined for rooted trees. We show theoretical properties of the reduction algorithm and evaluation of empirical datasets.
We provided algorithms and tools that were successfully applied to several empirical datasets. In particular, our comparative study shows that we can improve known results on genomic duplication inference from real datasets.
发现基因复制的位置以及多次基因复制事件是进化分子生物学中的一个基本问题。1996年由吉戈等人提出的问题是,将有根二叉基因家族树集合中的基因复制事件映射到其对应的有根二叉物种树上,使得多次基因复制事件的总数最小化。文献中有几种模型规定了如何将基因家族中的基因复制解释为一次复制事件。然而,在所有复制事件问题中,基因树都是有根的。这种限制限制了其适用性,因为无根基因家族树经常通过系统发育方法推断得出。
在本文中,我们展示了针对输入基因家族树无根的事件聚类这一开放问题的首个解决方案。具体而言,通过利用无根和解的理论性质,我们展示了一种有效的算法,该算法将此问题简化为针对有根树定义的事件聚类问题。我们展示了简化算法的理论性质以及对经验数据集的评估。
我们提供的算法和工具已成功应用于多个经验数据集。特别是,我们的比较研究表明,我们可以改进从真实数据集进行基因组复制推断的已知结果。