Department of Civil, Environmental and Plant Engineering, Konkuk University, Seoul 05029, Republic of Korea.
Office for Busan Region Management of the Nakdong River, Korea Water Resources Corporation (K-water), Busan 49300, Republic of Korea.
Water Res. 2021 Dec 1;207:117821. doi: 10.1016/j.watres.2021.117821. Epub 2021 Oct 30.
Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years' worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the imbalance of observed data. Reliable prediction by the improved models can be used to aid the design of management practices to mitigate algal blooms within a reservoir.
许多国家都试图监测和预测有害藻类水华,以减轻相关问题并建立管理措施。目前基于警报系统的细胞密度采样用于暗示水华状态,并促使与水相关的组织做出快速和充分的响应。本研究的目的是开发一种针对蓝藻水华的预警系统,以便在藻类水华发生之前进行有效的决策,并指导管理措施的预防措施。在这项研究中,使用了两种机器学习模型:人工神经网络(ANN)和支持向量机(SVM),利用 8 年来气象、水动力和水质数据,对一个夏季频繁发生有害蓝藻水华的水库进行藻类水华预警水平的实时预测。然而,由于输出变量的所有警报级别数据的比例不平衡,导致数据驱动模型的训练存在偏差,并降低了模型的预测性能。因此,使用自适应合成(ADASYN)采样方法生成的合成数据来解决原始数据中小数类数据的不平衡问题,并提高模型的预测性能。结果表明,使用原始数据和合成数据组合构建的模型中,谨慎级别(L1)和警告级别(L2)的整体预测性能均高于仅使用原始数据构建的模型。特别是,在训练(包括验证)和测试期间使用原始数据和合成数据组合构建的最佳 ANN 和 SVM 产生了明显提高的 L1 的召回率和精度值,这是一个非常关键的警报级别,因为它表示从正常状态到水华形成的过渡状态。此外,使用添加合成数据构建的两个最佳模型在测试期间预测 L-1 和 L-2 时,召回率和精度均提高了 33.7%以上。因此,通过解决观测数据的不平衡问题,合成数据的应用可以提高机器学习模型的检测性能。改进后的模型的可靠预测可用于帮助设计水库中的管理措施以减轻藻类水华。