Hengl Tomislav, Walsh Markus G, Sanderman Jonathan, Wheeler Ichsani, Harrison Sandy P, Prentice Iain C
Envirometrix Ltd, Wageningen, Netherlands.
The Earth Institute, Columbia University, New York, NY, USA.
PeerJ. 2018 Aug 22;6:e5457. doi: 10.7717/peerj.5457. eCollection 2018.
Potential natural vegetation (PNV) is the vegetation cover in equilibrium with climate, that would exist at a given location if not impacted by human activities. PNV is useful for raising public awareness about land degradation and for estimating land potential. This paper presents results of assessing machine learning algorithms-neural networks (nnet package), random forest (ranger), gradient boosting (gbm), K-nearest neighborhood (class) and Cubist-for operational mapping of PNV. Three case studies were considered: (1) global distribution of biomes based on the BIOME 6000 data set (8,057 modern pollen-based site reconstructions), (2) distribution of forest tree taxa in Europe based on detailed occurrence records (1,546,435 ground observations), and (3) global monthly fraction of absorbed photosynthetically active radiation (FAPAR) values (30,301 randomly-sampled points). A stack of 160 global maps representing biophysical conditions over land, including atmospheric, climatic, relief, and lithologic variables, were used as explanatory variables. The overall results indicate that random forest gives the overall best performance. The highest accuracy for predicting BIOME 6000 classes (20) was estimated to be between 33% (with spatial cross-validation) and 68% (simple random sub-setting), with the most important predictors being total annual precipitation, monthly temperatures, and bioclimatic layers. Predicting forest tree species (73) resulted in mapping accuracy of 25%, with the most important predictors being monthly cloud fraction, mean annual and monthly temperatures, and elevation. Regression models for FAPAR (monthly images) gave an R-square of 90% with the most important predictors being total annual precipitation, monthly cloud fraction, CHELSA bioclimatic layers, and month of the year, respectively. Further developments of PNV mapping could include using all GBIF records to map the global distribution of plant species at different taxonomic levels. This methodology could also be extended to dynamic modeling of PNV, so that future climate scenarios can be incorporated. Global maps of biomes, FAPAR and tree species at one km spatial resolution are available for download via http://dx.doi.org/10.7910/DVN/QQHCIK.
潜在自然植被(PNV)是与气候处于平衡状态的植被覆盖,如果不受人类活动影响,它会存在于特定位置。PNV有助于提高公众对土地退化的认识并估算土地潜力。本文展示了评估机器学习算法——神经网络(nnet包)、随机森林(ranger)、梯度提升(gbm)、K近邻(class)和Cubist——用于PNV业务制图的结果。考虑了三个案例研究:(1)基于BIOME 6000数据集(8057个基于现代花粉的地点重建)的生物群落全球分布,(2)基于详细出现记录(1546435条地面观测)的欧洲森林树种分布,以及(3)全球每月光合有效辐射吸收比例(FAPAR)值(30301个随机采样点)。一组160张代表陆地生物物理条件的全球地图,包括大气、气候、地形和岩性变量,被用作解释变量。总体结果表明随机森林的整体表现最佳。预测BIOME 6000类别(20个)的最高准确率估计在33%(空间交叉验证)至68%(简单随机子集划分)之间,最重要的预测因子是年总降水量、月气温和生物气候层。预测森林树种(73种)的制图准确率为25%,最重要的预测因子是月云量、年平均和月气温以及海拔。FAPAR(月度图像)的回归模型决定系数为90%,最重要的预测因子分别是年总降水量、月云量、CHELSA生物气候层和年份月份。PNV制图的进一步发展可能包括使用所有GBIF记录来绘制不同分类水平植物物种的全球分布。该方法还可扩展到PNV的动态建模,以便纳入未来气候情景。通过http://dx.doi.org/10.7910/DVN/QQHCIK可下载一千米空间分辨率的生物群落、FAPAR和树种全球地图。