利用自适应合成采样方法提高有害藻华预警机器学习模型的性能。

Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method.

机构信息

Department of Civil, Environmental and Plant Engineering, Konkuk University, Seoul 05029, Republic of Korea.

Office for Busan Region Management of the Nakdong River, Korea Water Resources Corporation (K-water), Busan 49300, Republic of Korea.

出版信息

Water Res. 2021 Dec 1;207:117821. doi: 10.1016/j.watres.2021.117821. Epub 2021 Oct 30.

DOI:10.1016/j.watres.2021.117821

PMID:34781184

Abstract

Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years' worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the imbalance of observed data. Reliable prediction by the improved models can be used to aid the design of management practices to mitigate algal blooms within a reservoir.

摘要

许多国家都试图监测和预测有害藻类水华，以减轻相关问题并建立管理措施。目前基于警报系统的细胞密度采样用于暗示水华状态，并促使与水相关的组织做出快速和充分的响应。本研究的目的是开发一种针对蓝藻水华的预警系统，以便在藻类水华发生之前进行有效的决策，并指导管理措施的预防措施。在这项研究中，使用了两种机器学习模型：人工神经网络（ANN）和支持向量机（SVM），利用 8 年来气象、水动力和水质数据，对一个夏季频繁发生有害蓝藻水华的水库进行藻类水华预警水平的实时预测。然而，由于输出变量的所有警报级别数据的比例不平衡，导致数据驱动模型的训练存在偏差，并降低了模型的预测性能。因此，使用自适应合成（ADASYN）采样方法生成的合成数据来解决原始数据中小数类数据的不平衡问题，并提高模型的预测性能。结果表明，使用原始数据和合成数据组合构建的模型中，谨慎级别（L1）和警告级别（L2）的整体预测性能均高于仅使用原始数据构建的模型。特别是，在训练（包括验证）和测试期间使用原始数据和合成数据组合构建的最佳 ANN 和 SVM 产生了明显提高的 L1 的召回率和精度值，这是一个非常关键的警报级别，因为它表示从正常状态到水华形成的过渡状态。此外，使用添加合成数据构建的两个最佳模型在测试期间预测 L-1 和 L-2 时，召回率和精度均提高了 33.7%以上。因此，通过解决观测数据的不平衡问题，合成数据的应用可以提高机器学习模型的检测性能。改进后的模型的可靠预测可用于帮助设计水库中的管理措施以减轻藻类水华。

相似文献

Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method.

Water Res. 2021 Dec 1;207:117821. doi: 10.1016/j.watres.2021.117821. Epub 2021 Oct 30.

Machine Learning-Based Early Warning Level Prediction for Cyanobacterial Blooms Using Environmental Variable Selection and Data Resampling.

Toxics. 2023 Nov 23;11(12):955. doi: 10.3390/toxics11120955.

Deep-learning and data-resampling: A novel approach to predict cyanobacterial alert levels in a reservoir.

Environ Res. 2024 Dec 15;263(Pt 2):120135. doi: 10.1016/j.envres.2024.120135. Epub 2024 Oct 9.

A machine learning approach for early warning of cyanobacterial bloom outbreaks in a freshwater reservoir.

J Environ Manage. 2021 Jun 15;288:112415. doi: 10.1016/j.jenvman.2021.112415. Epub 2021 Mar 26.

Recent advances in algal bloom detection and prediction technology using machine learning.

Sci Total Environ. 2024 Aug 15;938:173546. doi: 10.1016/j.scitotenv.2024.173546. Epub 2024 May 27.

Current status and prospects of algal bloom early warning technologies: A Review.

J Environ Manage. 2024 Jan 1;349:119510. doi: 10.1016/j.jenvman.2023.119510. Epub 2023 Nov 9.

Machine learning based marine water quality prediction for coastal hydro-environment management.

J Environ Manage. 2021 Apr 15;284:112051. doi: 10.1016/j.jenvman.2021.112051. Epub 2021 Jan 28.

Algal Bloom Prediction Using Extreme Learning Machine Models at Artificial Weirs in the Nakdong River, Korea.

Int J Environ Res Public Health. 2018 Sep 21;15(10):2078. doi: 10.3390/ijerph15102078.

Forecasting freshwater cyanobacterial harmful algal blooms for Sentinel-3 satellite resolved U.S. lakes and reservoirs.

J Environ Manage. 2024 Jan 1;349:119518. doi: 10.1016/j.jenvman.2023.119518. Epub 2023 Nov 7.

Self-optimization of training dataset improves forecasting of cyanobacterial bloom by machine learning.

Sci Total Environ. 2023 Mar 25;866:161398. doi: 10.1016/j.scitotenv.2023.161398. Epub 2023 Jan 5.

引用本文的文献

Estimates of Lake Nitrogen, Phosphorus, and Chlorophyll- Concentrations to Characterize Harmful Algal Bloom Risk Across the United States.

Earths Future. 2024 Aug 26;12(8):e2024EF004493. doi: 10.1029/2024EF004493.

The future of critical care: AI-powered mortality prediction for acute variceal gastrointestinal bleeding and acute non-variceal gastrointestinal bleeding patients.

Front Med (Lausanne). 2025 May 16;12:1580094. doi: 10.3389/fmed.2025.1580094. eCollection 2025.

Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced Data.

Diagnostics (Basel). 2025 Feb 19;15(4):501. doi: 10.3390/diagnostics15040501.

Remote Sensing Inversion of Water Quality Grades Using a Stacked Generalization Approach.

Sensors (Basel). 2024 Oct 18;24(20):6716. doi: 10.3390/s24206716.

Machine Learning-Based Early Warning Level Prediction for Cyanobacterial Blooms Using Environmental Variable Selection and Data Resampling.

Toxics. 2023 Nov 23;11(12):955. doi: 10.3390/toxics11120955.

Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting.

Toxins (Basel). 2023 Oct 10;15(10):608. doi: 10.3390/toxins15100608.

Detecting Starch-Head and Mildewed Fruit in Dried Hami Jujubes Using Visible/Near-Infrared Spectroscopy Combined with MRSA-SVM and Oversampling.

Foods. 2022 Aug 12;11(16):2431. doi: 10.3390/foods11162431.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用自适应合成采样方法提高有害藻华预警机器学习模型的性能。

Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献