Suppr超能文献

高通量预测超过 10 万种多氯持久性有机污染物(PC-POPs)在大鼠和小鼠中的口服急性毒性:基于可解释数据融合驱动的机器学习全局模型。

High-throughput prediction of oral acute toxicity in Rat and Mouse of over 100,000 polychlorinated persistent organic pollutants (PC-POPs) by interpretable data fusion-driven machine learning global models.

机构信息

Beijing Key Laboratory of Environmental and Viral Oncology, College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, PR China.

Beijing Key Laboratory of Environmental and Viral Oncology, College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, PR China; Department of Medical Technology, Beijing Pharmaceutical University of Staff and Workers, Beijing 100079, China.

出版信息

J Hazard Mater. 2024 Dec 5;480:136295. doi: 10.1016/j.jhazmat.2024.136295. Epub 2024 Oct 28.

Abstract

This study utilized available oral acute toxicity data in Rat and Mouse for polychlorinated persistent organic pollutants (PC-POPs) to construct data fusion-driven machine learning (ML) global models. Based on atom-centered fragments (ACFs), the collected high-throughput data overcame the applicability limitations, enabling accurate toxicity prediction for a wide range of PC-POPs series compounds using only single models. The data variances in the Rat training and test sets were 1.52 and 1.34, respectively, while for the Mouse, the values were 1.48 and 1.36, respectively. Genetic algorithm (GA) was used to build multiple linear regression (MLR) models and pre-screen descriptors, addressing the "black-box" problem prevalent in ML and enhancing model interpretability. The best ML models for Rat and Mouse achieved approximately 90 % prediction reliability for over 100,000 true untested compounds. Ultimately, a warning list of highly toxic compounds for eight categories of polychlorinated atom-centered fragments (PCACFs) was generated based on the prediction results. The analysis of descriptors revealed that dioxin analogs generally exhibited higher toxicity, because the heteroatoms and ring systems increased structural complexity and formed larger conjugated systems, contributing to greater oral acute toxicity. The present study provides valuable insights for guiding the subsequent in vivo tests, environmental risk assessment and the improvement of global governance system of pollutants.

摘要

本研究利用大鼠和小鼠的现有口服急性毒性数据,构建了基于原子中心片段(ACFs)的数据融合驱动的机器学习(ML)全球模型,用于多氯持久性有机污染物(PC-POPs)。该模型克服了适用性限制,仅使用单个模型即可对广泛的 PC-POPs 系列化合物进行准确的毒性预测。大鼠训练集和测试集的数据方差分别为 1.52 和 1.34,而小鼠的相应数据方差分别为 1.48 和 1.36。遗传算法(GA)用于构建多元线性回归(MLR)模型和预筛选描述符,解决了 ML 中普遍存在的“黑箱”问题,增强了模型的可解释性。大鼠和小鼠的最佳 ML 模型对超过 10 万种真实未测试化合物的预测可靠性约为 90%。最终,根据预测结果,针对 8 类多氯原子中心片段(PCACFs)生成了高度毒性化合物的警告列表。对描述符的分析表明,二噁英类似物通常表现出更高的毒性,因为杂原子和环系统增加了结构复杂性,并形成了更大的共轭系统,导致更大的口服急性毒性。本研究为指导随后的体内试验、环境风险评估以及污染物全球治理体系的改进提供了有价值的见解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验