Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1685-91. doi: 10.1109/TCBB.2011.83.
When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include incomplete lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With incomplete lineage sorting, species tree inference is to find the tree minimizing extra gene lineages that had to coexist along species lineages; with gene duplication, it becomes to find the tree minimizing gene duplications and/or losses. In this paper, we present the following results: 1) The deep coalescence cost is equal to the number of gene losses minus two times the gene duplication cost in the reconciliation of a uniquely leaf labeled gene tree and a species tree. The deep coalescence cost can be computed in linear time for any arbitrary gene tree and species tree. 2) The deep coalescence cost is always not less than the gene duplication cost in the reconciliation of an arbitrary gene tree and a species tree. 3) Species tree inference by minimizing deep coalescence events is NP-hard.
当从不同物种中取样基因副本时,得到的基因树可能与包含的物种树不一致。基因树和物种树不匹配的主要原因包括不完全谱系分选、水平基因转移、基因复制和丢失。这些事件中的每一个都为从基因树推断(包含)物种树提供了不同的简约性标准。在不完全谱系分选的情况下,物种树推断是要找到最小化必须与物种谱系共存的额外基因谱系的树;在基因复制的情况下,它变成了找到最小化基因复制和/或丢失的树。在本文中,我们提出了以下结果:1)在唯一叶标记基因树和物种树的协调中,深合并成本等于基因丢失的数量减去两次基因复制成本。对于任何任意的基因树和物种树,深合并成本都可以在线性时间内计算。2)在任意基因树和物种树的协调中,深合并成本总是不小于基因复制成本。3)通过最小化深合并事件进行物种树推断是 NP 难的。