Arribas-Bel Daniel, Patino Jorge E, Duque Juan C
Department of Geography & Planning, University of Liverpool, Liverpool, United Kingdom.
Research in Spatial Economics (RiSE-group), Department of Economics, Universidad EAFIT, Medellín, Colombia.
PLoS One. 2017 May 2;12(5):e0176684. doi: 10.1371/journal.pone.0176684. eCollection 2017.
This paper provides evidence on the usefulness of very high spatial resolution (VHR) imagery in gathering socioeconomic information in urban settlements. We use land cover, spectral, structure and texture features extracted from a Google Earth image of Liverpool (UK) to evaluate their potential to predict Living Environment Deprivation at a small statistical area level. We also contribute to the methodological literature on the estimation of socioeconomic indices with remote-sensing data by introducing elements from modern machine learning. In addition to classical approaches such as Ordinary Least Squares (OLS) regression and a spatial lag model, we explore the potential of the Gradient Boost Regressor and Random Forests to improve predictive performance and accuracy. In addition to novel predicting methods, we also introduce tools for model interpretation and evaluation such as feature importance and partial dependence plots, or cross-validation. Our results show that Random Forest proved to be the best model with an R2 of around 0.54, followed by Gradient Boost Regressor with 0.5. Both the spatial lag model and the OLS fall behind with significantly lower performances of 0.43 and 0.3, respectively.
本文提供了关于超高空间分辨率(VHR)图像在收集城市住区社会经济信息方面有用性的证据。我们使用从英国利物浦的谷歌地球图像中提取的土地覆盖、光谱、结构和纹理特征,来评估它们在小统计区域层面预测生活环境剥夺情况的潜力。我们还通过引入现代机器学习的元素,为利用遥感数据估算社会经济指数的方法文献做出了贡献。除了普通最小二乘法(OLS)回归和空间滞后模型等经典方法外,我们还探索了梯度提升回归器和随机森林在提高预测性能和准确性方面的潜力。除了新颖的预测方法,我们还引入了用于模型解释和评估的工具,如特征重要性和部分依赖图,或交叉验证。我们的结果表明,随机森林被证明是最佳模型,R²约为0.54,其次是梯度提升回归器,为0.5。空间滞后模型和OLS都落后,性能分别显著较低,为0.43和0.3。