Virro Holger, Kmoch Alexander, Vainu Marko, Uuemaa Evelyn
Department of Geography, Institute of Ecology and Earth Sciences, University of Tartu, Vanemuise 46, Tartu 51003, Estonia.
Department of Geography, Institute of Ecology and Earth Sciences, University of Tartu, Vanemuise 46, Tartu 51003, Estonia.
Sci Total Environ. 2022 Sep 20;840:156613. doi: 10.1016/j.scitotenv.2022.156613. Epub 2022 Jun 11.
Nutrient runoff from agricultural production is one of the main causes of water quality deterioration in river systems and coastal waters. Water quality modeling can be used for gaining insight into water quality issues in order to implement effective mitigation efforts. Process-based nutrient models are very complex, requiring a lot of input parameters and computationally expensive calibration. Recently, ML approaches have shown to achieve an accuracy comparable to the process-based models and even outperform them when describing nonlinear relationships. We used observations from 242 Estonian catchments, amounting to 469 yearly TN and 470 TP measurements covering the period 2016-2020 to train random forest (RF) models for predicting annual N and P concentrations. We used a total of 82 predictor variables, including land cover, soil, climate and topography parameters and applied a feature selection strategy to reduce the number of dependent features in the models. The SHAP method was used for deriving the most relevant predictors. The performance of our models is comparable to previous process-based models used in the Baltic region with the TN and TP model having an R score of 0.83 and 0.52, respectively. However, as input data used in our models is easier to obtain, the models offer superior applicability in areas, where data availability is insufficient for process-based approaches. Therefore, the models enable to give a robust estimation for nutrient losses at national level and allows to capture the spatial variability of the nutrient runoff which in turn enables to provide decision-making support for regional water management plans.
农业生产中的养分径流是河流水系和沿海水域水质恶化的主要原因之一。水质建模可用于深入了解水质问题,以便实施有效的缓解措施。基于过程的养分模型非常复杂,需要大量输入参数且校准计算成本高昂。最近,机器学习方法已显示出在描述非线性关系时能达到与基于过程的模型相当的精度,甚至表现更优。我们使用了来自爱沙尼亚242个流域的观测数据,共计469次年度总氮(TN)和470次总磷(TP)测量数据,涵盖2016 - 2020年期间,用于训练随机森林(RF)模型以预测年度氮和磷浓度。我们总共使用了82个预测变量,包括土地覆盖、土壤、气候和地形参数,并应用了特征选择策略来减少模型中相关特征的数量。SHAP方法用于推导最相关的预测变量。我们模型的性能与波罗的海地区先前使用的基于过程的模型相当,总氮和总磷模型的R得分分别为0.83和0.52。然而,由于我们模型中使用的输入数据更容易获取,这些模型在数据可用性不足以支持基于过程方法的地区具有更好的适用性。因此,这些模型能够在国家层面上对养分流失进行稳健估计,并能够捕捉养分径流 的空间变异性,进而为区域水资源管理计划提供决策支持。