Oberpriller Johannes, de Souza Leite Melina, Pichler Maximilian
Theoretical Ecology University of Regensburg Regensburg Germany.
Department of Ecology University of São Paulo São Paulo Brazil.
Ecol Evol. 2022 Jul 24;12(7):e9062. doi: 10.1002/ece3.9062. eCollection 2022 Jul.
Biological data are often intrinsically hierarchical (e.g., species from different genera, plants within different mountain regions), which made mixed-effects models a common analysis tool in ecology and evolution because they can account for the non-independence. Many questions around their practical applications are solved but one is still debated: Should we treat a grouping variable with a low number of levels as a random or fixed effect? In such situations, the variance estimate of the random effect can be imprecise, but it is unknown if this affects statistical power and type I error rates of the fixed effects of interest. Here, we analyzed the consequences of treating a grouping variable with 2-8 levels as fixed or random effect in correctly specified and alternative models (under- or overparametrized models). We calculated type I error rates and statistical power for all-model specifications and quantified the influences of study design on these quantities. We found no influence of model choice on type I error rate and power on the population-level effect (slope) for random intercept-only models. However, with varying intercepts and slopes in the data-generating process, using a random slope and intercept model, and switching to a fixed-effects model, in case of a singular fit, avoids overconfidence in the results. Additionally, the number and difference between levels strongly influences power and type I error. We conclude that inferring the correct random-effect structure is of great importance to obtain correct type I error rates. We encourage to start with a mixed-effects model independent of the number of levels in the grouping variable and switch to a fixed-effects model only in case of a singular fit. With these recommendations, we allow for more informative choices about study design and data analysis and make ecological inference with mixed-effects models more robust for small number of levels.
生物学数据通常具有内在的层次结构(例如,来自不同属的物种,不同山区内的植物),这使得混合效应模型成为生态学和进化研究中常用的分析工具,因为它们可以考虑非独立性。围绕其实际应用的许多问题已经得到解决,但仍有一个问题存在争议:我们是否应将水平数较少的分组变量视为随机效应或固定效应?在这种情况下,随机效应的方差估计可能不准确,但尚不清楚这是否会影响感兴趣的固定效应的统计功效和I型错误率。在这里,我们分析了在正确设定的模型和替代模型(参数不足或参数过多的模型)中将具有2至8个水平的分组变量视为固定效应或随机效应的后果。我们计算了所有模型设定的I型错误率和统计功效,并量化了研究设计对这些量的影响。我们发现,对于仅具有随机截距的模型,模型选择对总体水平效应(斜率)的I型错误率和功效没有影响。然而,在数据生成过程中存在变化的截距和斜率时,使用随机斜率和截距模型,并在拟合出现奇异性时切换到固定效应模型,可以避免对结果过度自信。此外,水平数及其差异对功效和I型错误有很大影响。我们得出结论,推断正确的随机效应结构对于获得正确的I型错误率非常重要。我们鼓励从一个与分组变量的水平数无关的混合效应模型开始,仅在拟合出现奇异性时才切换到固定效应模型。通过这些建议,我们可以在研究设计和数据分析方面做出更具信息性的选择,并使使用混合效应模型进行生态学推断在水平数较少时更稳健。