Cluster of Excellence "Machine Learning: New Perspectives for Science", Eberhard Karls University Tuebingen, Maria von Linden Str. 6, 72076, Tübingen, Germany.
Soil and Spatial Data Science, Soilution GbR, Heiligegeiststrasse 13, 06484, Quedlinburg, Germany.
Sci Rep. 2020 Oct 7;10(1):16737. doi: 10.1038/s41598-020-73773-y.
Two important theories in spatial modelling relate to structural and spatial dependence. Structural dependence refers to environmental state-factor models, where an environmental property is modelled as a function of the states and interactions of environmental predictors, such as climate, parent material or relief. Commonly, the functions are regression or supervised classification algorithms. Spatial dependence is present in most environmental properties and forms the basis for spatial interpolation and geostatistics. In machine learning, modelling with geographic coordinates or Euclidean distance fields, which resemble linear variograms with infinite ranges, can produce similar interpolations. Interpolations do not lend themselves to causal interpretations. Conversely, with structural modeling, one can, potentially, extract knowledge from the modelling. Two important characteristics of such interpretable environmental modelling are scale and information content. Scale is relevant because very coarse scale predictors can show nearly infinite ranges, falling out of what we call the information horizon, i.e. interpretation using domain knowledge isn't possible. Regarding information content, recent studies have shown that meaningless predictors, such as paintings or photographs of faces, can be used for spatial environmental modelling of ecological and soil properties, with accurate evaluation statistics. Here, we examine under which conditions modelling with such predictors can lead to accurate statistics and whether an information horizon can be derived for scale and information content.
空间建模中有两个重要的理论,分别与结构性和空间依赖性有关。结构性依赖指的是环境状态因子模型,其中环境属性被建模为环境预测因子(如气候、母质或地形)的状态和相互作用的函数。通常,这些函数是回归或有监督分类算法。空间依赖性存在于大多数环境属性中,是空间插值和地统计学的基础。在机器学习中,使用地理坐标或欧几里得距离场建模,其类似于具有无限范围的线性变程,可以产生类似的插值。插值不适用于因果解释。相反,通过结构性建模,人们可以从建模中提取知识。这种可解释的环境建模的两个重要特征是尺度和信息含量。尺度是相关的,因为非常粗糙的尺度预测因子可能会显示出几乎无限的范围,超出了我们所说的信息范围,即使用领域知识进行解释是不可能的。关于信息含量,最近的研究表明,无意义的预测因子(如绘画或人脸照片)可以用于生态和土壤属性的空间环境建模,并且具有准确的评估统计数据。在这里,我们研究了在何种条件下,使用此类预测因子进行建模可以导致准确的统计数据,以及是否可以为尺度和信息含量推导信息范围。