College of Agriculture, Yangzhou University, Yangzhou 225009, China.
School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, China.
Molecules. 2024 Mar 20;29(6):1381. doi: 10.3390/molecules29061381.
Accurately predicting plant cuticle-air partition coefficients () is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting . The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and = 0.891) is recommended as the best model for predicting due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.
准确预测植物角质层-空气分配系数()对于评估有机污染物的生态风险和阐明其分配机制至关重要。本研究从 25 种植物和 106 种化合物中收集了 255 个实测值(数据集(I)),并对其进行平均处理,建立了一个包含 106 种化合物值的数据集(数据集(II))。采用机器学习算法(多元线性回归(MLR)、多层感知器(MLP)、k-最近邻(KNN)和梯度提升决策树(GBDT))构建了 8 个用于预测的 QSPR 模型。结果表明,所开发的模型具有较高的拟合优度以及良好的稳健性和预测性能。GBDT-2 模型(Radj2 = 0.925、QLOO2 = 0.756、QBOOT2 = 0.864、Rext2 = 0.837、Qext2 = 0.811 和 = 0.891)被推荐为预测的最佳模型,因为其性能更优。此外,基于 Shapley 加性解释(SHAP)方法对 GBDT-1 和 GBDT-2 模型进行解释,阐明了分子大小、极化率和分子复杂性等分子特性如何影响植物角质层吸附空气中有机污染物的能力。所开发模型的良好性能表明,它们具有广泛应用于指导有机污染物环境归宿和促进环保、可持续化学工程进展的潜力。