Department of Agricultural Engineering, Federal University of Viçosa, Viçosa, 36570-900, MG, Brazil.
Department of Agricultural Engineering, Federal University of Viçosa, Viçosa, 36570-900, MG, Brazil.
J Environ Manage. 2021 Jul 15;290:112625. doi: 10.1016/j.jenvman.2021.112625. Epub 2021 Apr 22.
There are different methods for predicting streamflow, and, recently machine learning has been widely used for this purpose. This technique uses a wide set of covariables in the prediction process that must undergo a selection to increase the precision and stability of the models. Thus, this work aimed to analyze the effect of covariable selection with Recursive Feature Elimination (RFE) and Forward Feature Selection (FFS) in the performance of machine learning models to predict daily streamflow. The study was carried out in the Piranga river basin, located in the State of Minas Gerais, Brazil. The database consisted of an 18-year-old historical series (2000-2017) of streamflow data at the outlet of the basin and the covariables derived from the streamflow of affluent rivers, precipitation, land use and land cover, products from the MODIS sensors, and time. The highly correlated covariables were eliminated and the selection of covariables by the level of importance was carried out by the RFE and FFS methods for the Multivariate Adaptive Regression (EARTH), Multiple Linear Regression (MLR), and Random Forest (RF) models. The data were partitioned into two groups: 75% for training and 25% for validation. The models were run 50 times and had their performance evaluated by the Nash Sutcliffe efficiency coefficient (NSE), Determination coefficient (R), and Root of Mean Square Error (RMSE). The three models tested showed satisfactory performance with both covariable selection methods, however, all of them proved to be inaccurate for predicting values associated with maximum streamflow events. The use of FFS, in most cases, improved the performance of the models and reduced the number of selected covariables. The use of machine learning to predict daily streamflow proved to be efficient and the use of FFS in the selection of covariables enhanced this efficiency.
有不同的方法可以预测河川径流量,最近机器学习技术已被广泛用于这一目的。该技术在预测过程中使用了大量协变量,这些协变量必须经过选择,以提高模型的精度和稳定性。因此,本研究旨在分析递归特征消除(RFE)和前向特征选择(FFS)在协变量选择对机器学习模型预测日径流量性能的影响。该研究在巴西米纳斯吉拉斯州的皮兰加河流域进行。数据库由流域出口处 18 年的历史系列(2000-2017 年)的径流量数据和来自支流的径流量、降水、土地利用和土地覆盖、MODIS 传感器产品和时间的协变量组成。通过 RFE 和 FFS 方法对多元自适应回归(EARTH)、多元线性回归(MLR)和随机森林(RF)模型进行高度相关的协变量消除和协变量选择。将数据分为两组:75%用于训练,25%用于验证。模型运行了 50 次,并通过纳什效率系数(NSE)、确定系数(R)和均方根误差(RMSE)来评估模型的性能。所测试的三个模型均具有令人满意的性能,且这两种协变量选择方法均有效,但是,所有模型在预测最大径流量事件相关值时均证明不够准确。在大多数情况下,FFS 的使用提高了模型的性能,并减少了所选协变量的数量。使用机器学习来预测日径流量是有效的,并且在协变量选择中使用 FFS 增强了这种效率。