School of Environmental Engineering, University of Seoul, Dongdaemun-gu, Seoul 02504, Republic of Korea.
Yeongsan River Environment Research Center, National institute of Environmental Research, 5, Cheomdangwagi-ro 208 beon-gil, Buk-gu, Gwangju 61011, Republic of Korea.
Water Res. 2022 May 15;215:118289. doi: 10.1016/j.watres.2022.118289. Epub 2022 Mar 12.
Routine monitoring for harmful algal blooms (HABs) is generally undertaken at low temporal frequency (e.g., weekly to monthly) that is unsuitable for capturing highly dynamic variations in cyanobacteria abundance. Therefore, we developed a model incorporating reverse time attention with a decay mechanism (RETAIN-D) to forecast HABs with simultaneous improvements in temporal resolution, forecasting performance, and interpretability. The usefulness of RETAIN-D in forecasting HABs was illustrated by its application to two sites located in the lower sections of the Nakdong and Yeongsan rivers, South Korea, where HABs pose a critical water quality issue. Three variations of recurrent neural network models, i.e., long short-term memory (LSTM), gated recurrent unit (GRU), and reverse time attention (RETAIN), were adopted for comparisons of performance with RETAIN-D. Input features encompassing meteorological, hydrological, environmental, and biological factors were used to forecast cyanobacteria abundance (total cyanobacteria cell counts and cell counts of dominant cyanobacteria taxa). Incorporation of a decay mechanism into the deep learning structure in RETAIN-D allowed forecasts of HABs on a high temporal resolution (daily) without manual feature engineering, increasing the usefulness of resulting forecasts for water quality and resources management. RETAIN-D yielded a high degree of accuracy (RMSE = 0.29-1.67, R = 0.76-0.98, MAE = 0.18-1.14, SMAPE = 9.77-87.94% for test sets; on natural log scales) across model outputs and sites, successfully capturing high variability and irregularities in the time series. RETAIN-D showed higher accuracy than RETAIN (except for comparable accuracy in forecasting Microcystis abundance at the Nakdong River site) and outperformed LSTM and GRU across all model outputs and sites. Ambient temperature had high importance in forecasting cyanobacteria abundance across all model outputs and sites, whereas the relative importance of other input features varied by the output and site. Increases in contributions with increasing irradiance, decreasing flow rates, and increasing residence time were more pronounced in summer than other seasons. Differences in the contributions of input features among different time steps (1 to 7 days prior to forecasting) were larger in the Yeongsan River site. RETAIN-D is applicable to a wide range of forecasting models that can benefit from improved temporal resolution, performance, and interpretability.
常规的有害藻华(HAB)监测通常以低时间频率(例如每周至每月)进行,这不适用于捕捉蓝藻丰度的高度动态变化。因此,我们开发了一种结合反向时间注意和衰减机制的模型(RETAIN-D),以同时提高时间分辨率、预测性能和可解释性来预测 HAB。通过将 RETAIN-D 应用于位于韩国南汉江和延山河流域下游的两个地点,说明了其在预测 HAB 中的有用性,这些地点的 HAB 对水质构成了严重威胁。采用了三种变体的递归神经网络模型,即长短期记忆(LSTM)、门控循环单元(GRU)和反向时间注意(RETAIN),与 RETAIN-D 进行性能比较。使用包含气象、水文、环境和生物因素的输入特征来预测蓝藻丰度(总蓝藻细胞计数和优势蓝藻分类单元的细胞计数)。在 RETAIN-D 的深度学习结构中加入衰减机制,无需手动进行特征工程,即可实现 HAB 的高时间分辨率(每日)预测,从而提高预测结果在水质和资源管理方面的实用性。RETAIN-D 在模型输出和地点方面都具有很高的准确性(测试集的 RMSE=0.29-1.67,R=0.76-0.98,MAE=0.18-1.14,SMAPE=9.77-87.94%;在自然对数尺度上),成功捕捉到时间序列中的高度可变性和不规则性。RETAIN-D 的准确性高于 RETAIN(在预测南汉江站点的微囊藻丰度方面除外),并且在所有模型输出和地点方面均优于 LSTM 和 GRU。在所有模型输出和地点中,环境温度对蓝藻丰度的预测具有重要意义,而其他输入特征的相对重要性因输出和地点而异。在夏季,辐照度增加、流速降低和停留时间增加的贡献增加比其他季节更为明显。在不同时间步长(预测前 1 至 7 天)之间,输入特征的贡献差异在延山河流域站点更大。RETAIN-D 适用于许多可以从提高时间分辨率、性能和可解释性中受益的预测模型。