Steel M A, Székely L A, Hendy M D
Mathematics and Statistics Department, University of Canterbury, Christchurch, NZ.
J Comput Biol. 1994 Summer;1(2):153-63. doi: 10.1089/cmb.1994.1.153.
For a sequence of colors independently evolving on a tree under a simple Markov model, we consider conditions under which the tree can be uniquely recovered from the "sequence spectrum"-the expected frequencies of the various leaf colorations. This is relevant for phylogenetic analysis (where colors represent nucleotides or amino acids; leaves represent extant taxa) as the sequence spectrum is estimated directly from a collection of aligned sequences. Allowing the rate of the evolutionary process to vary across sites is an important extension over most previous studies-we show that, given suitable restrictions on the rate distribution, the true tree (up to the placement of its root) is uniquely identified by its sequence spectrum. However, if the rate distribution is unknown and arbitrary, then, for simple models, it is possible for every tree to produce the same sequence spectrum. Hence there is a logical barrier to accurate, consistent phylogenetic inference for these models when assumptions about the rate distribution are not made. This result exploits a novel theorem on the action of polynomials with non-negative coefficients on sequences.
对于在简单马尔可夫模型下在树上独立演化的一系列颜色,我们考虑在哪些条件下可以从“序列谱”(即各种叶色的预期频率)唯一地恢复树。这与系统发育分析相关(其中颜色代表核苷酸或氨基酸;叶代表现存分类群),因为序列谱是直接从一组比对序列中估计出来的。允许进化过程的速率在不同位点变化是对大多数先前研究的一个重要扩展——我们表明,在对速率分布有适当限制的情况下,真实的树(直至其根的位置)由其序列谱唯一确定。然而,如果速率分布未知且任意,那么对于简单模型,每棵树都有可能产生相同的序列谱。因此,在不对速率分布做出假设的情况下,这些模型进行准确、一致的系统发育推断存在逻辑障碍。这一结果利用了关于非负系数多项式对序列作用的一个新定理。