Department of Forestry, University of Missouri, Columbia, Missouri, USA.
PLoS One. 2012;7(8):e44486. doi: 10.1371/journal.pone.0044486. Epub 2012 Aug 31.
Species distribution models require selection of species, study extent and spatial unit, statistical methods, variables, and assessment metrics. If absence data are not available, another important consideration is pseudoabsence generation. Different strategies for pseudoabsence generation can produce varying spatial representation of species.
We considered model outcomes from four different strategies for generating pseudoabsences. We generating pseudoabsences randomly by 1) selection from the entire study extent, 2) a two-step process of selection first from the entire study extent, followed by selection for pseudoabsences from areas with predicted probability <25%, 3) selection from plots surveyed without detection of species presence, 4) a two-step process of selection first for pseudoabsences from plots surveyed without detection of species presence, followed by selection for pseudoabsences from the areas with predicted probability <25%. We used Random Forests as our statistical method and sixteen predictor variables to model tree species with at least 150 records from Forest Inventory and Analysis surveys in the Laurentian Mixed Forest province of Minnesota.
Pseudoabsence generation strategy completely affected the area predicted as present for species distribution models and may be one of the most influential determinants of models. All the pseudoabsence strategies produced mean AUC values of at least 0.87. More importantly than accuracy metrics, the two-step strategies over-predicted species presence, due to too much environmental distance between the pseudoabsences and recorded presences, whereas models based on random pseudoabsences under-predicted species presence, due to too little environmental distance between the pseudoabsences and recorded presences. Models using pseudoabsences from surveyed plots produced a balance between areas with high and low predicted probabilities and the strongest relationship between density and area with predicted probabilities ≥75%. Because of imperfect accuracy assessment, the best assessment currently may be evaluation of whether the species has been sufficiently but not excessively predicted to occur.
物种分布模型需要选择物种、研究范围和空间单元、统计方法、变量和评估指标。如果没有可用的缺失数据,另一个重要的考虑因素是伪缺失生成。不同的伪缺失生成策略会产生不同的物种空间表示。
我们考虑了四种不同的生成伪缺失策略的模型结果。我们通过以下方式生成伪缺失:1)从整个研究范围内选择;2)首先从整个研究范围内选择,然后从预测概率<25%的区域中选择伪缺失;3)从没有检测到物种存在的调查样地中选择;4)首先从没有检测到物种存在的调查样地中选择,然后从预测概率<25%的区域中选择。我们使用随机森林作为统计方法,使用 16 个预测变量来模拟至少有 150 个记录的树种,这些记录来自明尼苏达州 Laurentian 混合林地区的森林清查和分析调查。
伪缺失生成策略完全影响了物种分布模型中预测为存在的区域,可能是模型最具影响力的决定因素之一。所有伪缺失策略的平均 AUC 值都至少为 0.87。比准确性指标更重要的是,两步策略由于伪缺失和记录的存在之间的环境距离太大,过度预测了物种的存在,而基于随机伪缺失的模型由于伪缺失和记录的存在之间的环境距离太小,因此低估了物种的存在。使用调查样地中的伪缺失生成的模型在高预测概率和低预测概率区域之间取得了平衡,并且与预测概率≥75%的区域之间的密度和面积之间的关系最强。由于不完善的准确性评估,目前最好的评估可能是评估物种是否已经被充分但不过度地预测为存在。