Disanto Filippo, Rosenberg Noah A
Department of Biology, Stanford University , Stanford, California.
J Comput Biol. 2017 Sep;24(9):831-850. doi: 10.1089/cmb.2016.0159. Epub 2017 Apr 24.
Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive computation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa n. For fixed n, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with [Formula: see text], where [Formula: see text] is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with [Formula: see text] and the variance with ∼[Formula: see text]. The results provide a contribution to the combinatorial study of gene trees and species trees.
给定一个基因树和一个物种树,祖先配置表示可以到达物种树给定节点的基因谱系的组合上不同的集合。它们已被引入作为一种数据结构,用于在给定物种树的情况下,在多物种合并模型下递归计算基因树拓扑结构的条件概率,这种计算的成本受物种树中基因树的祖先配置数量的影响。对于匹配的基因树和物种树,我们得到了关于祖先配置的枚举结果。我们研究了由给定种子树确定的平衡和不平衡树族中的祖先配置,表明对于具有多个分类单元的种子树,两个树族中祖先配置的数量均随分类单元数量(n)呈指数增长。对于固定的(n),在物种树根节点处列出的祖先配置的最大数量以及带标签拓扑可能的最大带标签历史数量出现在具有完全相同无标签形状的树中。对于根处的祖先配置,最大值随([公式:见正文])增加,其中([公式:见正文])是一个二次递归常数。在给定大小的带标签树集合上的均匀分布下,根祖先配置的平均数量随([公式:见正文])增长,方差随(\sim[公式:见正文])增长。这些结果为基因树和物种树的组合研究做出了贡献。