Wegayehu Eyob Betru, Muluneh Fiseha Behulu
School of Civil and Environmental Engineering, Addis Ababa Institute of Technology, Addis Ababa University, Addis Ababa, Ethiopia.
Heliyon. 2023 Jul 6;9(7):e17982. doi: 10.1016/j.heliyon.2023.e17982. eCollection 2023 Jul.
Traditional data-driven streamflow predictions usually apply a single model with inconsistent performance in different variability conditions. These days model ensembles or merging the benefits of different models without losing the general character of the data are becoming a trend in hydrology. This study compared three super ensemble learners with eight base models. Twelve years of monthly rolled daily time series data in three river catchments of Ethiopia (Borkena watershed: Awash river basin), (Gummera watershed: Abay river basin), and (Sore watershed: Baro Akobo river basin) is used for single-step daily streamflow simulation using previous thirty-day input timesteps. Five input scenarios are applied: three vegetation indices, three remote sensing-based precipitation products, ground-gauged rainfall, all fused inputs, and selected inputs with the Recursive Feature Elimination (RFE) algorithm. The time series is then divided into training and testing datasets with a ratio of 80:20. The performance of the proposed models was evaluated using the Root Mean Squared Error (RMSE), coefficient of determination (R), Mean Absolute Error (MAE), and Median Absolute Error (MEDAE). Finally, the result is presented with the corresponding five input scenarios. The catchment's and input scenario's average performance indicated that the three super ensemble learners outperformed the eight base models with relatively stable performance. The top-ranked WASE model exceeded the linear regression baseline by 13.3%. XGB, CNN-GRU, and LSTM proved the highest performance of the base models. This study also revealed that LSTM's key downside is its performance drop in the absence of feature selection criteria. In comparison, XGB showed its superior performance after controlling redundant inputs internally. Moreover, this study uniquely highlights the potential of remote sensing-based vegetation indices in the science of data-driven streamflow modelling for non-gauged catchments with no meteorological time series.
传统的数据驱动径流预测通常采用单一模型,其在不同变率条件下的性能并不一致。如今,模型集成或将不同模型的优势相结合而又不丢失数据的总体特征正成为水文学领域的一种趋势。本研究将三种超级集成学习器与八个基础模型进行了比较。利用埃塞俄比亚三个河流流域(博尔凯纳流域:阿瓦什河流域)、(古梅拉流域:阿巴伊河流域)和(索雷流域:巴罗阿科博河流域)的12年逐月滚动日时间序列数据,采用前30天的输入时间步长进行单步日径流模拟。应用了五种输入情景:三种植被指数、三种基于遥感的降水产品、地面测量降雨、所有融合输入以及使用递归特征消除(RFE)算法选择的输入。然后将时间序列按80:20的比例划分为训练和测试数据集。使用均方根误差(RMSE)、决定系数(R)、平均绝对误差(MAE)和中位数绝对误差(MEDAE)对所提出模型的性能进行评估。最后,给出了相应五种输入情景下的结果。流域和输入情景的平均性能表明,三种超级集成学习器的性能优于八个基础模型,且性能相对稳定。排名第一的WASE模型比线性回归基线高出13.3%。XGB、CNN - GRU和LSTM证明是基础模型中性能最高的。本研究还表明,LSTM的关键缺点是在没有特征选择标准的情况下其性能会下降。相比之下,XGB在内部控制冗余输入后表现出卓越的性能。此外,本研究独特地突出了基于遥感的植被指数在无气象时间序列的无测站流域数据驱动径流建模科学中的潜力。