Na Iseul, Kim Taeho, Qiu Pengpeng, Son Younggyu
Department of Environmental Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea; Department of Energy Engineering Convergence, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea.
AI Lab, ROSIS IT, Seoul 07547, Republic of Korea.
Ultrason Sonochem. 2024 Nov;110:107032. doi: 10.1016/j.ultsonch.2024.107032. Epub 2024 Aug 21.
In this study, machine learning (ML) algorithms were employed to predict the pseudo-1st-order reaction rate constants for the sonochemical degradation of aqueous organic pollutants under various conditions. A total of 618 sets of data, including ultrasonic, solution, and pollutant characteristics, were collected from 89 previous studies. Considering the difference between the electrical power (P) and calorimetric power (P), the collected data were divided into two groups: data with P and data with P. Eight input variables, including frequency, power density, pH, temperature, initial concentration, solubility, vapor pressure, and octanol-water partition coefficient (K), and one target variable of the degradation rate constant, were selected for ML. Statistical analysis was conducted, and outliers were determined separately for the two groups. ML models, including random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LGB), were used to predict the pseudo-1st-order reaction rate constants for the removal of aqueous pollutants. The prediction performance of the ML models was evaluated using different metrics, including the root mean squared error (RMSE), mean absolute error (MAE), and R squared (R). A significantly higher prediction performance was obtained using data without outliers and augmented data. Consequently, all the applied ML models could be used to predict the sonochemical degradation of aqueous pollutants, and the XGB model showed the highest accuracy in predicting the rate constants. In addition, the power density and frequency were the most influential factors among the eight input variables in prediction with the Shapley additive explanation (SHAP) values method. The degradation rate constants of the two pollutants over a wide frequency range (20-1,000 kHz) were predicted using the trained ML model (XGB) and the prediction results were analyzed.
在本研究中,采用机器学习(ML)算法来预测在各种条件下水中有机污染物声化学降解的伪一级反应速率常数。从先前的89项研究中收集了总共618组数据,包括超声、溶液和污染物特性。考虑到电功率(P)和量热功率(P)之间的差异,将收集到的数据分为两组:有P的数据和有P的数据。为ML选择了八个输入变量,包括频率、功率密度、pH值、温度、初始浓度、溶解度、蒸气压和正辛醇-水分配系数(K),以及一个降解速率常数的目标变量。进行了统计分析,并分别确定了两组的异常值。使用包括随机森林(RF)、极端梯度提升(XGB)和轻梯度提升机(LGB)在内的ML模型来预测去除水中污染物的伪一级反应速率常数。使用不同的指标评估ML模型的预测性能,包括均方根误差(RMSE)、平均绝对误差(MAE)和决定系数(R)。使用无异常值的数据和增强数据获得了显著更高的预测性能。因此,所有应用的ML模型都可用于预测水中污染物的声化学降解,并且XGB模型在预测速率常数方面显示出最高的准确性。此外,在使用Shapley加法解释(SHAP)值方法进行预测时,功率密度和频率是八个输入变量中最具影响力的因素。使用训练好的ML模型(XGB)预测了两种污染物在较宽频率范围(20-1000 kHz)内的降解速率常数,并对预测结果进行了分析。