Das Basanta Kumar, Paul Sanatan, Mandal Biswajit, Gogoi Pranab, Paul Liton, Saha Ajoy, Johnson Canciyal, Das Akankshya, Ray Archisman, Roy Shreya, Das Gupta Shubhadeep
ICAR-Central Inland Fisheries Research Institute, Barrackpore, Kolkata, 700120, West Bengal, India.
Environ Sci Pollut Res Int. 2025 Feb;32(8):4670-4689. doi: 10.1007/s11356-025-35999-z. Epub 2025 Jan 30.
Nitrate, a highly reactive form of inorganic nitrogen, is commonly found in aquatic environments. Understanding the dynamics of nitrate-N concentration in rivers and its interactions with other water-quality parameters is crucial for effective freshwater ecosystem management. This study uses advanced machine learning models to analyse water quality parameters and predict nitrate-N concentrations in the lower stretch of the Ganga River from the observations of six annual periods (2017 to 2022). The parameters include water temperature, pH, specific conductivity (Sp_Con), dissolved oxygen (DO), nitrate-N, total phosphate (TP), turbidity, biochemical oxygen demand (BOD), silicate, total dissolved solids (TDS), and rainfall. The present study evaluated the predictive performance of five models-Multiple Polynomial Regression (MPR), Generalized Additive Models (GAMs), Decision Tree Regression, Random Forest (RF), and XGBoost (Extreme Gradient Boosting)-using RMSE, MAE, MAPE, NSE and R metrics. XGBoost emerged as the top performer, with an RMSE of 0.024, MAE of 0.018, MAPE of 51.805, NSE of 0.855 and R of 0.85, explaining 85% of the variance in nitrate-N concentrations. Random Forest also demonstrated strong predictive capability, with an RMSE of 0.028, MAE of 0.021, MAPE of 57.272, NSE of 0.804 and R of 0.80. MPR effectively modelled non-linear relationships, explaining 75% of the variance, while Decision Tree Regression and GAMs were less effective, with R values of 0.60 and 0.48, respectively. Variables (BOD, pH, Rainfall, water temperature, and total phosphate) were the best predictors of nitrate-N dynamics. Comparative analysis with previous studies confirmed the robustness of XGBoost and Random Forest in environmental data modelling. The findings highlight the importance of advanced machine learning models in accurately predicting water quality parameters and facilitating proactive management strategies.
硝酸盐是无机氮的一种高反应性形式,在水生环境中普遍存在。了解河流中硝酸盐氮浓度的动态变化及其与其他水质参数的相互作用,对于有效的淡水生态系统管理至关重要。本研究使用先进的机器学习模型,根据六个年度周期(2017年至2022年)的观测数据,分析恒河下游河段的水质参数并预测硝酸盐氮浓度。这些参数包括水温、pH值、电导率(Sp_Con)、溶解氧(DO)、硝酸盐氮、总磷(TP)、浊度、生化需氧量(BOD)、硅酸盐、总溶解固体(TDS)和降雨量。本研究使用均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)、纳什效率系数(NSE)和相关系数(R)评估了多元多项式回归(MPR)、广义相加模型(GAMs)、决策树回归、随机森林(RF)和极端梯度提升(XGBoost)这五种模型的预测性能。XGBoost表现最佳,RMSE为0.024,MAE为0.018,MAPE为51.805,NSE为0.855,R为0.85,解释了硝酸盐氮浓度85%的方差。随机森林也显示出很强的预测能力,RMSE为0.028,MAE为0.021,MAPE为57.272,NSE为0.804,R为0.80。MPR有效地模拟了非线性关系,解释了75%的方差,而决策树回归和GAMs的效果较差,R值分别为0.60和0.48。变量(BOD、pH值、降雨量、水温、总磷)是硝酸盐氮动态变化的最佳预测因子。与先前研究的对比分析证实了XGBoost和随机森林在环境数据建模中的稳健性。研究结果突出了先进机器学习模型在准确预测水质参数和促进积极管理策略方面的重要性。