Fisher Rebecca, Wilson Shaun K, Sin Tsai M, Lee Ai C, Langlois Tim J
Australian Institute of Marine Science UWA Oceans Institute Crawley WA Australia.
The UWA Oceans Institute and School of Biological Sciences The University of Western Australia Crawley WA Australia.
Ecol Evol. 2018 May 20;8(12):6104-6113. doi: 10.1002/ece3.4134. eCollection 2018 Jun.
Full-subsets information theoretic approaches are becoming an increasingly popular tool for exploring predictive power and variable importance where a wide range of candidate predictors are being considered. Here, we describe a simple function in the statistical programming language R that can be used to construct, fit, and compare a complete model set of possible ecological or environmental predictors, given a response variable of interest and a starting generalized additive (mixed) model fit. Main advantages include not requiring a complete model to be fit as the starting point for candidate model set construction (meaning that a greater number of predictors can potentially be explored than might be available through functions such as dredge); model sets that include interactions between factors and continuous nonlinear predictors; and automatic removal of models with correlated predictors (based on a user defined criterion for exclusion). The function takes continuous predictors, which are fitted using smoothers via either gam, gamm (mgcv) or gamm4, as well as factor variables which are included on their own or as two-level interaction terms within the gam smooth (via use of the "by" argument), or with themselves. The function allows any model to be constructed and used as a null model, and takes a range of arguments that allow control over the model set being constructed, including specifying cyclic and linear continuous predictors, specification of the smoothing algorithm used, and the maximum complexity allowed for smooth terms. The use of the function is demonstrated via case studies that highlight how appropriate model sets can be easily constructed and the broader utility of the approach for exploratory ecology.
全子集信息论方法正日益成为一种流行的工具,用于探索预测能力和变量重要性,此时需要考虑广泛的候选预测变量。在这里,我们描述了统计编程语言R中的一个简单函数,给定一个感兴趣的响应变量和一个初始的广义相加(混合)模型拟合,该函数可用于构建、拟合和比较一组完整的可能的生态或环境预测变量模型。主要优点包括:不需要将完整模型作为候选模型集构建的起点进行拟合(这意味着与通过诸如dredge等函数相比,可能可以探索更多的预测变量);模型集包括因子与连续非线性预测变量之间的相互作用;以及基于用户定义的排除标准自动去除具有相关预测变量(协变量)的模型。该函数接受连续预测变量,通过gam、gamm(mgcv)或gamm4使用平滑器进行拟合,以及因子变量,这些因子变量可以单独包含,也可以作为gam平滑内的二级交互项(通过使用“by”参数),或者自身相互作用。该函数允许构建任何模型并将其用作空模型,并接受一系列参数,这些参数可用于控制正在构建的模型集,包括指定循环和线性连续预测变量、指定使用的平滑算法以及平滑项允许的最大复杂度。通过案例研究展示了该函数的使用,这些案例研究突出了如何轻松构建合适的模型集以及该方法在探索性生态学中的更广泛用途。