Suppr超能文献

开发基于分子结构识别持久性、迁移性和毒性(PMT)以及高持久性和高迁移性(vPvM)候选物质的机器学习方法。

Developing machine learning approaches to identify candidate persistent, mobile and toxic (PMT) and very persistent and very mobile (vPvM) substances based on molecular structure.

作者信息

Han Min, Jin Biao, Liang Jun, Huang Chen, Arp Hans Peter H

机构信息

State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China.

State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China.

出版信息

Water Res. 2023 Oct 1;244:120470. doi: 10.1016/j.watres.2023.120470. Epub 2023 Aug 9.

Abstract

Determining which substances on the global market could be classified as persistent, mobile and toxic (PMT) substances or very persistent, very mobile (vPvM) substances is essential to prevent or reduce drinking water contamination from them. This study developed machine learning models based on different molecular descriptors (MDs) and defined applicability domains for the screening of PMT/vPvM substances. The models were trained with 3111 substances with expert weight-of-evidence based PMT/vPvM hazard classifications that considered the highest quality data available. The model was based on the hypothesis that PMT/vPvM substances contain similar MDs, representative of chemical structures resistant to degradation, be associated with low sorption (or high-water solubility) and in some cases be associated with known toxic mechanisms. All possible model combinations were tested by integrating different molecular description methods, data balancing strategies and machine learning algorithms. Our model allows one-step prediction of candidate PMT/vPvM substances, and our method was compared with the approach predicting P, M and T separately (i.e. three-step prediction). The results showed that the one-step model achieved a higher accuracy of 92% for PMT/vPvM identification (i.e. positive samples) for an internal test set, and also resulted in a higher accuracy of 90% for an external test set of chemical pollutants detected in Taihu Lake, China. Furthermore, prediction mechanism of the model was interpreted by Shapley additive explanations (SHAP). This work presents an advance of big data in silico screening models for the identification of substances that potentially meet the PMT/vPvM criteria.

摘要

确定全球市场上哪些物质可被归类为持久性、迁移性和毒性(PMT)物质或高持久性、高迁移性(vPvM)物质,对于预防或减少其对饮用水的污染至关重要。本研究基于不同的分子描述符(MDs)开发了机器学习模型,并定义了用于筛选PMT/vPvM物质的适用域。这些模型使用3111种物质进行训练,这些物质具有基于专家证据权重的PMT/vPvM危害分类,该分类考虑了可获得的最高质量数据。该模型基于这样的假设:PMT/vPvM物质包含相似的MDs,代表抗降解的化学结构,与低吸附(或高水溶性)相关,并且在某些情况下与已知的毒性机制相关。通过整合不同的分子描述方法、数据平衡策略和机器学习算法,对所有可能的模型组合进行了测试。我们的模型允许对候选PMT/vPvM物质进行一步预测,并且我们的方法与分别预测P、M和T的方法(即三步预测)进行了比较。结果表明,一步模型在内部测试集上对PMT/vPvM识别(即阳性样本)的准确率达到了92%,在中国太湖检测到的化学污染物外部测试集上的准确率也达到了90%。此外,通过Shapley加法解释(SHAP)对模型的预测机制进行了解释。这项工作展示了大数据计算机筛选模型在识别潜在符合PMT/vPvM标准的物质方面的进展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验