Center for Population Biology, University of California, Davis, California 95616, USA.
Evolution. 2012 Jul;66(7):2240-51. doi: 10.1111/j.1558-5646.2011.01574.x. Epub 2012 Feb 19.
Phylogenetic comparative methods may fail to produce meaningful results when either the underlying model is inappropriate or the data contain insufficient information to inform the inference. The ability to measure the statistical power of these methods has become crucial to ensure that data quantity keeps pace with growing model complexity. Through simulations, we show that commonly applied model choice methods based on information criteria can have remarkably high error rates; this can be a problem because methods to estimate the uncertainty or power are not widely known or applied. Furthermore, the power of comparative methods can depend significantly on the structure of the data. We describe a Monte Carlo-based method which addresses both of these challenges, and show how this approach both quantifies and substantially reduces errors relative to information criteria. The method also produces meaningful confidence intervals for model parameters. We illustrate how the power to distinguish different models, such as varying levels of selection, varies both with number of taxa and structure of the phylogeny. We provide an open-source implementation in the pmc ("Phylogenetic Monte Carlo") package for the R programming language. We hope such power analysis becomes a routine part of model comparison in comparative methods.
当基础模型不合适或数据信息量不足以支持推断时,系统发育比较方法可能无法产生有意义的结果。衡量这些方法的统计功效的能力对于确保数据量跟上日益增长的模型复杂性至关重要。通过模拟,我们表明,基于信息准则的常用模型选择方法可能会有很高的错误率;这可能是一个问题,因为估计不确定性或功效的方法并不广为人知或应用。此外,比较方法的功效可能会显著取决于数据的结构。我们描述了一种基于蒙特卡罗的方法,可以解决这两个挑战,并展示了这种方法如何相对于信息准则进行量化和显著减少错误。该方法还为模型参数生成有意义的置信区间。我们说明了如何区分不同模型(例如选择程度的变化)的功效随着分类单元数量和系统发育结构的变化而变化。我们在 R 编程语言的 pmc(“系统发育蒙特卡罗”)包中提供了一个开源实现。我们希望这种功效分析成为比较方法中模型比较的常规部分。