Sun Ting, Wei Chongzhi, Liu Yang, Ren Yueying
School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China.
School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China; Ministry of Education Engineering Research Center of Water Resource Comprehensive Utilization in Cold and Arid Regions, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China.
Sci Total Environ. 2024 Dec 20;957:177399. doi: 10.1016/j.scitotenv.2024.177399. Epub 2024 Nov 16.
A quantitative structure-activity relationship (QSAR) study was conducted on 313 pesticides to predict their acute toxicity to Sheepshead minnow (Cyprinodon variegatus) by using DRAGON descriptors. Essentials accounting for a reliable model were all considered carefully, giving full consideration to the OECD (Organization for Economic Co-operation and Development) principles for QSAR acceptability in regulation during the model construction and assessment process. Nine variables were selected through the forward stepwise regression method and used as inputs to construct both linear and nonlinear models. The obtained models were validated internally and externally. Generally, machine learning-based methods, namely support vector machine (SVM), random forest (RF), and projection pursuit regression (PPR), perform better than the multiple linear regression (MLR) model. The statistical results (R = 0.682-0.933, Q = 0.604-0.659, Q = 0.740-0.796, CCC = 0.861-0.882) of the developed models show that they are robust, reliable, reproducible, accurate and predictive. Comparatively, the RF model performs best, giving predictive correlation coefficient Q of 0.814, root mean squared error (RMSE) of 0.658 and mean absolute error (MAE) of 0.534 for the test set, respectively. The RF model (as well as SVM and PPR models) was visualized and explained by using the SHapley Additive explanation (SHAP) analysis to enhance its transparency and credibility. In addition, the applicability domain (AD) range of the RF model was characterized by the Williams plot and the tree manifold approximation and projection (TMAP) technology was utilized to illustrate similarity and diversity of the entire data space, to assist in the analysis of the outliers. Activity cliff detection was investigated by using Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. It was found that none of the pesticides was identified as an activity cliff in the training set or a potential prediction cliff in the test set. Therefore, the RF model fulfills each OECD principle in regulation for QSAR models. The research in this work will aid in the in silico QSAR prediction of the acute toxicity to Sheepshead minnow (Cyprinodon variegatus) for untested and new toxic pesticides and can also be extended to other studies.
通过使用Dragon描述符,对313种农药进行了定量构效关系(QSAR)研究,以预测它们对红树溪鳉(Cyprinodon variegatus)的急性毒性。在模型构建和评估过程中,认真考虑了构建可靠模型所需的各项要素,并充分考虑了经合组织(OECD,经济合作与发展组织)关于QSAR可接受性的监管原则。通过向前逐步回归方法选择了9个变量,并将其用作构建线性和非线性模型的输入。对所得模型进行了内部和外部验证。一般来说,基于机器学习的方法,即支持向量机(SVM)、随机森林(RF)和投影寻踪回归(PPR),比多元线性回归(MLR)模型表现更好。所开发模型的统计结果(R = 0.682 - 0.933,Q = 0.604 - 0.659,Q = 0.740 - 0.796,CCC = 0.861 - 0.882)表明它们具有稳健性、可靠性、可重复性、准确性和预测性。相比之下,RF模型表现最佳,测试集的预测相关系数Q为0.814,均方根误差(RMSE)为0.658,平均绝对误差(MAE)为0.534。通过使用SHapley加法解释(SHAP)分析对RF模型(以及SVM和PPR模型)进行了可视化和解释,以提高其透明度和可信度。此外,通过Williams图对RF模型的适用域(AD)范围进行了表征,并利用树流形近似和投影(TMAP)技术来说明整个数据空间的相似性和多样性,以辅助异常值分析。通过使用K组分析中的算术残差(ARKA)描述符研究了活性悬崖检测。结果发现,在训练集中没有农药被确定为活性悬崖,在测试集中也没有农药被确定为潜在的预测悬崖。因此,RF模型符合OECD关于QSAR模型的各项监管原则。本研究工作将有助于对未经测试的新型有毒农药对红树溪鳉(Cyprinodon variegatus)的急性毒性进行计算机模拟QSAR预测,并且也可以扩展到其他研究。