Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa.
Institute for Smart Systems Technologies, Transportation Informatics, Alpen-Adria Universität Klagenfurt, 9020 Klagenfurt, Austria.
Sensors (Basel). 2022 Sep 27;22(19):7338. doi: 10.3390/s22197338.
Harmful cyanobacterial bloom (HCB) is problematic for drinking water treatment, and some of its strains can produce toxins that significantly affect human health. To better control eutrophication and HCB, catchment managers need to continuously keep track of nitrogen (N) and phosphorus (P) in the water bodies. However, the high-frequency monitoring of these water quality indicators is not economical. In these cases, machine learning techniques may serve as viable alternatives since they can learn directly from the available surrogate data. In the present work, a random forest, extremely randomized trees (ET), extreme gradient boosting, k-nearest neighbors, a light gradient boosting machine, and bagging regressor-based virtual sensors were used to predict N and P in two catchments with contrasting land uses. The effect of data scaling and missing value imputation were also assessed, while the Shapley additive explanations were used to rank feature importance. A specification book, sensitivity analysis, and best practices for developing virtual sensors are discussed. Results show that ET, MinMax scaler, and a multivariate imputer were the best predictive model, scaler, and imputer, respectively. The highest predictive performance, reported in terms of R, was 97% in the rural catchment and 82% in an urban catchment.
有害蓝藻水华(HCB)对饮用水处理构成问题,其部分菌株会产生毒素,对人类健康造成严重影响。为了更好地控制富营养化和 HCB,集水区管理者需要持续跟踪水体中的氮(N)和磷(P)。然而,这些水质指标的高频监测并不经济。在这种情况下,机器学习技术可能是可行的替代方案,因为它们可以直接从可用的替代数据中学习。在本工作中,随机森林、极端随机树(ET)、极端梯度提升、k-最近邻、轻梯度提升机和基于袋装回归器的虚拟传感器被用于预测具有不同土地利用的两个集水区中的 N 和 P。还评估了数据缩放和缺失值插补的效果,同时使用 Shapley 加法解释来对特征重要性进行排序。讨论了规范手册、用于开发虚拟传感器的敏感性分析和最佳实践。结果表明,ET、MinMax 缩放器和多元插补器分别是最佳预测模型、缩放器和插补器。报告的最高预测性能(以 R 表示)分别为农村集水区的 97%和城市集水区的 82%。