Heddam Salim, Yaseen Zaher Mundher, Falah Mayadah W, Goliatt Leonardo, Tan Mou Leong, Sa'adi Zulfaqar, Ahmadianfar Iman, Saggi Mandeep, Bhatia Amandeep, Samui Pijush
Laboratory of Research in Biodiversity Interaction Ecosystem and Biotechnology, Hydraulics Division, Agronomy Department, Faculty of Science, University, 20 Août 1955, Route El Hadaik, BP 26, Skikda, Algeria.
Department of Earth Sciences and Environment, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia.
Environ Sci Pollut Res Int. 2022 Nov;29(51):77157-77187. doi: 10.1007/s11356-022-21201-1. Epub 2022 Jun 8.
This study aims to evaluate the usefulness and effectiveness of four machine learning (ML) models for modelling cyanobacteria blue-green algae (CBGA) at two rivers located in the USA. The proposed modelling framework was based on establishing a link between five water quality variables and the concentration of CBGA. For this purpose, artificial neural network (ANN), extreme learning machine (ELM), random forest regression (RFR), and random vector functional link (RVFL) are developed. First, the four models were developed using only water quality variables. Second, based on the results of the first, a new modelling strategy was introduced based on preprocessing signal decomposition. Hence, the empirical mode decomposition (EMD), the variational mode decomposition (VMD), and the empirical wavelet transform (EWT) were used for decomposing the water quality variables into several subcomponents, and the obtained intrinsic mode functions (IMFs) and multiresolution analysis (MRA) components were used as new input variables for the ML models. Results of the present investigation show that (i) using single models, good predictive accuracy was obtained using the RFR model exhibiting an R and NSE values of ≈0.914 and ≈0.833 for the first station, and ≈0.944 and ≈0.884 for the second station, while the others models, i.e., ANN, RVFL, and ELM, have failed to provide a good estimation of the CBGA; (ii) the decomposition methods have contributed to a significant improvement of the individual models performances; (iii) among the thee decomposition methods, the EMD was found to be superior to the VMD and EWT; and (iv) the ANN and RFR were found to be more accurate compared to the ELM and RVFL models, exhibiting high numerical performances with R and NSE values of approximately ≈0.983, ≈0.967, and ≈0.989 and ≈0.976, respectively.
本研究旨在评估四种机器学习(ML)模型在美国两条河流中对蓝藻(蓝绿藻,CBGA)进行建模的有用性和有效性。所提出的建模框架基于建立五个水质变量与CBGA浓度之间的联系。为此,开发了人工神经网络(ANN)、极限学习机(ELM)、随机森林回归(RFR)和随机向量函数链接(RVFL)。首先,仅使用水质变量开发这四种模型。其次,基于第一个结果,引入了一种基于预处理信号分解的新建模策略。因此,使用经验模态分解(EMD)、变分模态分解(VMD)和经验小波变换(EWT)将水质变量分解为几个子分量,并将获得的本征模态函数(IMF)和多分辨率分析(MRA)分量用作ML模型的新输入变量。本研究结果表明:(i)使用单一模型时,RFR模型具有良好的预测准确性,第一站的R值和NSE值分别约为0.914和0.833,第二站的R值和NSE值分别约为0.944和0.884,而其他模型,即ANN、RVFL和ELM,未能对CBGA进行良好估计;(ii)分解方法显著提高了单个模型的性能;(iii)在这三种分解方法中,EMD被发现优于VMD和EWT;(iv)与ELM和RVFL模型相比,ANN和RFR被发现更准确,R值和NSE值分别约为0.983、0.967和0.989、0.976,表现出较高的数值性能。