Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AK 99775-6660, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):710-22. doi: 10.1109/TCBB.2010.79.
Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the same topology. Recent work on a 2-state symmetric model of character change showed that for 4 taxa, such a mixture model has nonidentifiable parameters, and thus, it is theoretically impossible to determine the two tree topologies from any amount of data under such circumstances. Here, the question of identifiability is investigated for two-tree mixtures of the 4-state group-based models, which are more relevant to DNA sequence data. Using algebraic techniques, we show that the tree parameters are identifiable for the JC and K2P models. We also prove that generic substitution parameters for the JC mixture models are identifiable, and for the K2P and K3P models obtain generic identifiability results for mixtures on the same tree. This indicates that the full phylogenetic signal remains in such mixtures, and the 2-state symmetric result is thus a misleading guide to the behavior of other models.
从两种可能不同的树拓扑结构中得出的系统发育数据可能会通过几种生物学机制混合在一起,包括不同拓扑结构中的不完全谱系分选或水平基因转移,或者在相同拓扑结构中字符的简单不同替代过程。最近对字符变化的 2 状态对称模型的研究表明,对于 4 个分类群,这种混合模型的参数不可识别,因此,在这种情况下,从任何数量的数据都不可能从理论上确定两种树拓扑结构。在这里,我们研究了基于 4 状态组的模型的两棵树混合物的可识别性,这些模型与 DNA 序列数据更相关。使用代数技术,我们证明了 JC 和 K2P 模型的树参数是可识别的。我们还证明了 JC 混合模型的通用替换参数是可识别的,并且对于 K2P 和 K3P 模型,在同一棵树上的混合物获得了通用的可识别性结果。这表明完整的系统发育信号仍然存在于这种混合物中,因此 2 状态对称结果是对其他模型行为的误导性指导。