Luke Steven G
Department of Psychology, Brigham Young University, 1001 Spencer W. Kimball Tower, Provo, UT, 84602, USA.
Behav Res Methods. 2017 Aug;49(4):1494-1502. doi: 10.3758/s13428-016-0809-y.
Mixed-effects models are being used ever more frequently in the analysis of experimental data. However, in the lme4 package in R the standards for evaluating significance of fixed effects in these models (i.e., obtaining p-values) are somewhat vague. There are good reasons for this, but as researchers who are using these models are required in many cases to report p-values, some method for evaluating the significance of the model output is needed. This paper reports the results of simulations showing that the two most common methods for evaluating significance, using likelihood ratio tests and applying the z distribution to the Wald t values from the model output (t-as-z), are somewhat anti-conservative, especially for smaller sample sizes. Other methods for evaluating significance, including parametric bootstrapping and the Kenward-Roger and Satterthwaite approximations for degrees of freedom, were also evaluated. The results of these simulations suggest that Type 1 error rates are closest to .05 when models are fitted using REML and p-values are derived using the Kenward-Roger or Satterthwaite approximations, as these approximations both produced acceptable Type 1 error rates even for smaller samples.
混合效应模型在实验数据分析中的应用越来越频繁。然而,在R语言的lme4包中,评估这些模型中固定效应显著性(即获得p值)的标准有些模糊。这样做有充分的理由,但由于在许多情况下使用这些模型的研究人员需要报告p值,因此需要某种评估模型输出显著性的方法。本文报告了模拟结果,结果表明,评估显著性的两种最常用方法,即使用似然比检验和将z分布应用于模型输出的Wald t值(t-as-z),在某种程度上是反保守的,尤其是对于较小的样本量。还评估了其他评估显著性的方法,包括参数自抽样以及自由度的Kenward-Roger和Satterthwaite近似法。这些模拟结果表明,当使用REML拟合模型并使用Kenward-Roger或Satterthwaite近似法得出p值时,第一类错误率最接近0.05,因为即使对于较小的样本,这些近似法都产生了可接受的第一类错误率。