Hendy M D, Penny D
Mathematics Department, Massey University, Palmerston North, New Zealand.
J Comput Biol. 1996 Spring;3(1):19-31. doi: 10.1089/cmb.1996.3.19.
For various models of sequence evolution, the set of linear functions of the frequencies of the nucleotide patterns forms a vector space, the invariant space. Here we distinguish between the model of nucleotide substitution, and the phylogenetic tree T describing the paths on which these changes occur. We describe a procedure to construct a basis of the invariant space for those models that are extensions of models incorporating Kimura's three substitution model of nucleotide change, including both the Jukes-Cantor and Cavender-Farris models. The dimension of the invariant space is determined, for those models where it is independent of the tree topology, as a function of the number of sequences. These are calculated where the nucleotide distribution at the root is unspecified, and both with, and without, the assumption of the molecular clock hypothesis. The invariants have a number of potential applications, including tree identification, and testing the fit of models (which could include the molecular clock) to sequence data.
对于各种序列进化模型,核苷酸模式频率的线性函数集形成一个向量空间,即不变空间。在这里,我们区分核苷酸替换模型和描述这些变化发生路径的系统发育树T。我们描述了一种为那些作为包含木村核苷酸变化三替换模型的模型扩展的模型构建不变空间基的程序,包括Jukes-Cantor模型和Cavender-Farris模型。对于那些不变空间维度与树拓扑结构无关的模型,其维度作为序列数量的函数被确定。这些计算是在根处的核苷酸分布未指定的情况下进行的,并且分别在有和没有分子钟假设的情况下进行。这些不变量有许多潜在应用,包括树识别以及测试模型(可能包括分子钟)与序列数据的拟合度。