Suppr超能文献

利用机器学习模型提高初级和最终生物降解速率的预测和理解。

Improving predictions and understanding of primary and ultimate biodegradation rates with machine learning models.

机构信息

School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.

School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.

出版信息

Sci Total Environ. 2023 Dec 15;904:166623. doi: 10.1016/j.scitotenv.2023.166623. Epub 2023 Aug 29.

Abstract

This study aimed to develop machine learning based quantitative structure biodegradability relationship (QSBR) models for predicting primary and ultimate biodegradation rates of organic chemicals, which are essential parameters for environmental risk assessment. For this purpose, experimental primary and ultimate biodegradation rates of high consistency were compiled for 173 organic compounds. A significant number of descriptors were calculated with a collection of quantum/computational chemistry software and tools to achieve comprehensive representation and interpretability. Following a pre-screening process, multiple QSBR models were developed for both primary and ultimate endpoints using three algorithms: extreme gradient boosting (XGBoost), support vector machine (SVM), and multiple linear regression (MLR). Furthermore, a unified QSBR model was constructed using the knowledge transfer technique and XGBoost. Results demonstrated that all QSBR models developed in this study had good performance. Particularly, SVM models exhibited high level of goodness of fit (coefficient of determination on the training set of 0.973 for primary and 0.980 for ultimate), robustness (leave-one-out cross-validated coefficient of 0.953 for primary and 0.967 for ultimate), and external predictive ability (external explained variance of 0.947 for primary and 0.958 for ultimate). The knowledge transfer technique enhanced model performance by learning from properties of two biodegradation endpoints. Williams plots were used to visualize the application domains of the models. Through SHapley Additive exPlanations (SHAP) analysis, this study identified key features affecting biodegradation rates. Notably, MDEO-12, APC2D1_C_O, and other features contributed to primary biodegradation, while AATS0v, AATS2v, and others inhibited it. For ultimate biodegradation, features like No. of Rotatable Bonds, APC2D1_C_O, and minHBa were contributors, while C1SP3, Halogen Ratio, GGI4, and others hindered the process. Also, the study quantified the contributions of each feature in predictions for individual chemicals. This research provides valuable tools for predicting both primary and ultimate biodegradation rates while offering insights into the mechanisms.

摘要

本研究旨在开发基于机器学习的定量构效生物降解关系(QSBR)模型,用于预测有机化合物的初级和最终生物降解率,这是环境风险评估的重要参数。为此,我们为 173 种有机化合物编制了高浓度的实验初级和最终生物降解率。使用一系列量子/计算化学软件和工具计算了大量描述符,以实现全面的表示和可解释性。在预筛选过程之后,使用三种算法:极端梯度增强(XGBoost)、支持向量机(SVM)和多元线性回归(MLR),为初级和最终终点开发了多个 QSBR 模型。此外,使用知识转移技术和 XGBoost 构建了一个统一的 QSBR 模型。结果表明,本研究中开发的所有 QSBR 模型都具有良好的性能。特别是,SVM 模型表现出很高的拟合度(训练集上的决定系数为 0.973 用于初级,0.980 用于最终)、稳健性(用于初级的留一交叉验证系数为 0.953,用于最终的为 0.967)和外部预测能力(初级的外部解释方差为 0.947,最终的为 0.958)。知识转移技术通过从两个生物降解终点的性质中学习来提高模型性能。Williams 图用于可视化模型的应用领域。通过 SHapley Additive exPlanations(SHAP)分析,本研究确定了影响生物降解率的关键特征。值得注意的是,MDEO-12、APC2D1_C_O 和其他特征有助于初级生物降解,而 AATS0v、AATS2v 和其他特征则抑制了它。对于最终生物降解,特征如旋转键的数量、APC2D1_C_O 和 minHBa 是贡献者,而 C1SP3、卤素比、GGI4 和其他特征则阻碍了这个过程。此外,本研究还量化了每个特征在个别化学品预测中的贡献。本研究为预测初级和最终生物降解率提供了有价值的工具,并提供了对机制的深入了解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验