Buckley Thomas R, Cunningham Clifford W
Department of Biology, Duke University, Durham, North Carolina, USA.
Mol Biol Evol. 2002 Apr;19(4):394-405. doi: 10.1093/oxfordjournals.molbev.a004094.
The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the ability to discriminate among competing topologies. We have explored the relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty. We also performed equally weighted maximum parsimony analyses in order to assess the effects of ignoring branch length information during tree selection. We observed that maximum parsimony gave the lowest mean estimate of bootstrap support for the correct set of nodes relative to the ML models for every data set except one. For several data sets, we established that the exact distribution used to model among-site rate variation was critical for a successful phylogenetic analysis. Site-specific rate models were shown to perform very poorly relative to gamma and invariable sites models for several of the data sets most likely because of the gross underestimation of branch lengths. The invariable sites model also performed poorly for several data sets where this model had a poor fit to the data, suggesting that addition of the gamma distribution can be critical. Estimates of bootstrap support for the correct nodes often increased under gamma and invariable sites models relative to equal rates models. Our observations are contrary to the prediction that such models cause reduced confidence in phylogenetic hypotheses. Our results raise several issues regarding the process of model selection, and we briefly discuss model selection uncertainty and the role of sensitivity analyses in molecular phylogenetics.
在分子系统发育学中,富含参数的替代模型的使用受到了批评,理由是这些模型可能会导致准确性以及区分竞争拓扑结构的能力下降。我们针对六个已知真实关系且具有高度确定性的数据集,探讨了核苷酸替代模型复杂性与最大似然法(ML)下非参数自展支持率之间的关系。我们还进行了等加权最大简约分析,以评估在树选择过程中忽略分支长度信息的影响。我们观察到,除了一个数据集外,相对于ML模型,最大简约法对正确节点集的自展支持率的平均估计值最低。对于几个数据集,我们确定用于模拟位点间速率变化的精确分布对于成功的系统发育分析至关重要。对于几个数据集,位点特异性速率模型相对于伽马和不变位点模型表现非常差,这很可能是因为分支长度被严重低估。对于几个该模型与数据拟合不佳的数据集,不变位点模型的表现也很差,这表明添加伽马分布可能至关重要。相对于等速率模型,在伽马和不变位点模型下,对正确节点的自展支持率估计值通常会增加。我们的观察结果与此类模型会降低对系统发育假设的信心这一预测相反。我们的结果引发了有关模型选择过程的几个问题,并且我们简要讨论了模型选择的不确定性以及敏感性分析在分子系统发育学中的作用。