AMAP, Univ Montpellier, IRD, CNRS, INRAE, CIRAD, Montpellier, France.
CIRAD, UPR Forêts et Sociétés, F-34398, Montpellier, France.
Nat Commun. 2020 Sep 11;11(1):4540. doi: 10.1038/s41467-020-18321-y.
Mapping aboveground forest biomass is central for assessing the global carbon balance. However, current large-scale maps show strong disparities, despite good validation statistics of their underlying models. Here, we attribute this contradiction to a flaw in the validation methods, which ignore spatial autocorrelation (SAC) in data, leading to overoptimistic assessment of model predictive power. To illustrate this issue, we reproduce the approach of large-scale mapping studies using a massive forest inventory dataset of 11.8 million trees in central Africa to train and validate a random forest model based on multispectral and environmental variables. A standard nonspatial validation method suggests that the model predicts more than half of the forest biomass variation, while spatial validation methods accounting for SAC reveal quasi-null predictive power. This study underscores how a common practice in big data mapping studies shows an apparent high predictive power, even when predictors have poor relationships with the ecological variable of interest, thus possibly leading to erroneous maps and interpretations.
绘制地上森林生物量图是评估全球碳平衡的核心。然而,尽管其基础模型的验证统计数据良好,但当前的大规模地图显示出强烈的差异。在这里,我们将这种矛盾归因于验证方法的一个缺陷,该缺陷忽略了数据中的空间自相关(SAC),从而导致对模型预测能力的过度乐观评估。为了说明这个问题,我们使用中非的一个大规模森林清查数据集(包含 1180 万棵树)重现了大规模制图研究的方法,以基于多光谱和环境变量训练和验证基于随机森林的模型。标准的非空间验证方法表明,该模型可以预测超过一半的森林生物量变化,而考虑到 SAC 的空间验证方法则揭示了几乎为零的预测能力。这项研究强调了大数据制图研究中的一种常见做法,即使在预测因子与感兴趣的生态变量之间关系较差的情况下,也显示出明显的高预测能力,因此可能导致错误的地图和解释。