Evans S N, Zhou X
Department of Statistics, University of California at Berkeley, 94720-3860, USA.
J Comput Biol. 1998 Winter;5(4):713-24. doi: 10.1089/cmb.1998.5.713.
The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariant is typically non-zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of finding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of finding a basis for a certain lattice (that is, a free Z-module). They also conjectured that the cardinality of such a generating set can be computed using a simple "degrees of freedom" formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set.
不变量方法是一种利用核苷酸序列数据重建m个分类单元集合的系统发育树问题的方法。给定位点上4m种可能碱基向量各自概率的模型将具有未知参数,这些参数描述了沿着假定系统发育树分支发生替换的随机机制。不变量是这些概率的多项式,对于给定的系统发育,对于替换机制参数的所有选择都为零。如果对于另一个系统发育树不变量通常不为零,那么不变量的估计可以用作支持一个系统发育树优于另一个的证据。埃文斯和斯皮德之前的工作表明,对于某些常用的替换模型,为不变量理想找到最小生成集的问题可以简化为为某个格(即自由Z -模)找到基的线性代数问题。他们还推测,可以使用一个简单的“自由度”公式计算这样一个生成集的基数。我们验证了这个推测。在此过程中,我们详细解释了埃文斯和斯皮德的观察结果如何导致一种简单的、计算上可行的算法来构建最小生成集。