College of Environmental Science and Engineering, Donghua University, No. 2999 North Renmin Road, Shanghai, 201620, China.
Environ Geochem Health. 2024 Jan 16;46(2):31. doi: 10.1007/s10653-023-01778-3.
Laboratory determination of trihalomethanes (THMs) is a very time-consuming task. Therefore, establishing a THMs model using easily obtainable water quality parameters would be very helpful. This study explored the modeling methods of the random forest regression (RFR) model, support vector regression (SVR) model, and Log-linear regression model to predict the concentration of total-trihalomethanes (T-THMs), bromodichloromethane (BDCM), and dibromochloromethane (DBCM), using nine water quality parameters as input variables. The models were developed and tested using a dataset of 175 samples collected from a water treatment plant. The results showed that the RFR model, with the optimal parameter combination, outperformed the Log-linear regression model in predicting the concentration of T-THMs (N = 82-88%, r = 0.70-0.80), while the SVR model performed slightly better than the RFR model in predicting the concentration of BDCM (N = 85-98%, r = 0.70-0.97). The RFR model exhibited superior performance compared to the other two models in predicting the concentration of T-THMs and DBCM. The study concludes that the RFR model is superior overall to the SVR model and Log-linear regression models and could be used to monitor THMs concentration in water supply systems.
实验室测定三卤甲烷(THMs)是一项非常耗时的任务。因此,建立一个使用易于获得的水质参数来预测三卤甲烷浓度的模型将非常有帮助。本研究探索了随机森林回归(RFR)模型、支持向量回归(SVR)模型和对数线性回归模型的建模方法,使用九个水质参数作为输入变量来预测总三卤甲烷(T-THMs)、溴二氯甲烷(BDCM)和二溴氯甲烷(DBCM)的浓度。使用从一家水处理厂采集的 175 个样本数据集来开发和测试模型。结果表明,在预测 T-THMs 浓度方面(N=82-88%,r=0.70-0.80),RFR 模型在预测 T-THMs 浓度方面表现优于对数线性回归模型,而 SVR 模型在预测 BDCM 浓度方面略优于 RFR 模型(N=85-98%,r=0.70-0.97)。RFR 模型在预测 T-THMs 和 DBCM 浓度方面的表现优于其他两个模型。研究得出结论,RFR 模型总体上优于 SVR 模型和对数线性回归模型,可用于监测供水系统中的三卤甲烷浓度。