从过滤数据中估算树木：形态系统发生学模型的可识别性。

Estimating trees from filtered data: identifiability of models for morphological phylogenetics.

机构信息

Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AK 99775, USA.

出版信息

J Theor Biol. 2010 Mar 7;263(1):108-19. doi: 10.1016/j.jtbi.2009.12.001. Epub 2009 Dec 11.

DOI:10.1016/j.jtbi.2009.12.001

PMID:20004210

Abstract

As an alternative to parsimony analyses, stochastic models have been proposed (Lewis, 2001; Nylander et al., 2004) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statistical consistency of such model-based inference requires that the model parameters be identifiable from the joint distribution they entail, and this issue has not been addressed. Here we prove that parameters for several such models, with finite state spaces of arbitrary size, are identifiable, provided the tree has at least eight leaves. If the tree topology is already known, then seven leaves suffice for identifiability of the numerical parameters. The method of proof involves first inferring a full distribution of both parsimony-informative and non-informative pattern joint probabilities from the parsimony-informative ones, using phylogenetic invariants. The failure of identifiability of the tree parameter for four-taxon trees is also investigated.

摘要

作为简约分析的替代方法，已经提出了用于形态特征的随机模型（Lewis，2001；Nylander 等人，2004），以便可以对系统发育进行最大似然或贝叶斯分析。这些模型的一个关键特征是它们考虑了鉴定偏差，即仅观察到变化或简约信息丰富的特征。然而，基于模型的推断的统计一致性要求模型参数可以从它们所涉及的联合分布中识别出来，而这个问题尚未得到解决。在这里，我们证明了具有任意大小有限状态空间的几个这样的模型的参数是可识别的，前提是树至少有八个叶子。如果树拓扑结构已知，则七个叶子就足以识别数值参数。证明方法涉及首先使用系统发育不变量从简约信息丰富的特征推断出简约信息丰富和非信息丰富模式联合概率的完整分布。还研究了四分类树的树参数不可识别的情况。