Redding David W, Lucas Tim C D, Blackburn Tim M, Jones Kate E
Centre for Biodiversity and Environment Research, Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.
Big Data Institute, University of Oxford, Oxford, United Kingdom.
PLoS One. 2017 Nov 30;12(11):e0187602. doi: 10.1371/journal.pone.0187602. eCollection 2017.
Statistical approaches for inferring the spatial distribution of taxa (Species Distribution Models, SDMs) commonly rely on available occurrence data, which is often clumped and geographically restricted. Although available SDM methods address some of these factors, they could be more directly and accurately modelled using a spatially-explicit approach. Software to fit models with spatial autocorrelation parameters in SDMs are now widely available, but whether such approaches for inferring SDMs aid predictions compared to other methodologies is unknown. Here, within a simulated environment using 1000 generated species' ranges, we compared the performance of two commonly used non-spatial SDM methods (Maximum Entropy Modelling, MAXENT and boosted regression trees, BRT), to a spatial Bayesian SDM method (fitted using R-INLA), when the underlying data exhibit varying combinations of clumping and geographic restriction. Finally, we tested how any recommended methodological settings designed to account for spatially non-random patterns in the data impact inference. Spatial Bayesian SDM method was the most consistently accurate method, being in the top 2 most accurate methods in 7 out of 8 data sampling scenarios. Within high-coverage sample datasets, all methods performed fairly similarly. When sampling points were randomly spread, BRT had a 1-3% greater accuracy over the other methods and when samples were clumped, the spatial Bayesian SDM method had a 4%-8% better AUC score. Alternatively, when sampling points were restricted to a small section of the true range all methods were on average 10-12% less accurate, with greater variation among the methods. Model inference under the recommended settings to account for autocorrelation was not impacted by clumping or restriction of data, except for the complexity of the spatial regression term in the spatial Bayesian model. Methods, such as those made available by R-INLA, can be successfully used to account for spatial autocorrelation in an SDM context and, by taking account of random effects, produce outputs that can better elucidate the role of covariates in predicting species occurrence. Given that it is often unclear what the drivers are behind data clumping in an empirical occurrence dataset, or indeed how geographically restricted these data are, spatially-explicit Bayesian SDMs may be the better choice when modelling the spatial distribution of target species.
推断分类单元空间分布的统计方法(物种分布模型,SDMs)通常依赖于现有的出现数据,而这些数据往往是聚集的且在地理上受到限制。尽管现有的SDM方法考虑了其中一些因素,但使用空间明确的方法可以更直接、准确地进行建模。现在已有广泛可用的软件来拟合具有空间自相关参数的SDM模型,但与其他方法相比,这种推断SDM的方法是否有助于预测尚不清楚。在此,在一个使用1000个生成的物种分布范围的模拟环境中,我们比较了两种常用的非空间SDM方法(最大熵建模,MAXENT和增强回归树,BRT)与一种空间贝叶斯SDM方法(使用R-INLA拟合)在基础数据呈现不同聚集和地理限制组合时的性能。最后,我们测试了任何为考虑数据中的空间非随机模式而设计的推荐方法设置如何影响推断。空间贝叶斯SDM方法是最一致准确的方法,在8种数据采样场景中的7种中,它是最准确的两种方法之一。在高覆盖率样本数据集中,所有方法的表现相当相似。当采样点随机分布时,BRT比其他方法的准确率高1%-3%;当样本聚集时,空间贝叶斯SDM方法的AUC得分高4%-8%。或者,当采样点限制在真实分布范围的一小部分时,所有方法的平均准确率降低10%-12%,且方法之间的差异更大。在考虑自相关的推荐设置下进行模型推断,除了空间贝叶斯模型中空间回归项的复杂性外,不受数据聚集或限制的影响。诸如R-INLA提供的那些方法,可以成功地用于在SDM背景下考虑空间自相关,并且通过考虑随机效应,产生能够更好地阐明协变量在预测物种出现中作用的输出。鉴于在经验出现数据集中数据聚集背后的驱动因素通常不清楚,或者实际上这些数据在地理上受到多大限制,在对目标物种的空间分布进行建模时,空间明确的贝叶斯SDM可能是更好的选择。