构建用于预测鱼类中有机化学品生物累积参数的可解释集成学习模型。

Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish.

作者信息

Zhu Minghua, Xiao Zijun, Zhang Tao, Lu Guanghua

机构信息

Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China.

Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China.

出版信息

J Hazard Mater. 2025 Jan 15;482:136606. doi: 10.1016/j.jhazmat.2024.136606. Epub 2024 Nov 20.

DOI:10.1016/j.jhazmat.2024.136606

PMID:39579709

Abstract

Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as AD) methodology. The optimal EL models, together with the AD, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.

摘要

准确预测生物累积参数对于评估化学品的暴露、危害和风险至关重要。然而，大多数关于生物累积参数的预测模型都是基于单一算法的个体模型，缺乏模型解释，由于算法的固有局限性和较弱的可解释性，导致预测精度不尽人意。结合多种算法的集成学习（EL）与SHapley加法解释（SHAP）方法相结合，可能会克服这些局限性。在此，使用涵盖2496种化学品的数据集构建了针对三个生物累积参数的EL模型。与本研究中开发的个体模型以及先前研究中的模型相比，EL模型表现出卓越的预测精度，在验证集上的决定系数高达0.861。使用基于结构-活性景观的方法（简称为AD）对适用域进行了表征。最优的EL模型与AD一起成功用于预测中国现有化学物质清单中包含的4374种化学品的生物累积参数。使用SHAP方法进行的模型解释揭示了影响生物累积潜力的关键特征，包括分子的疏水性、水溶性、极化率、电离势、重量和体积。总体而言，该研究提供了数据和模型，以支持化学品的合理管理和风险评估。

相似文献

Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish.构建用于预测鱼类中有机化学品生物累积参数的可解释集成学习模型。

J Hazard Mater. 2025 Jan 15;482:136606. doi: 10.1016/j.jhazmat.2024.136606. Epub 2024 Nov 20.

Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals.集成迁移学习和多任务学习策略，构建用于预测化学品生物积累参数的图神经网络模型。

Environ Sci Technol. 2024 Sep 3;58(35):15650-15660. doi: 10.1021/acs.est.4c02421. Epub 2024 Jul 25.

QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods.基于机器学习和集成方法的有机化合物对水生生物的生物浓缩因子和毒性的定量构效关系建模研究。

Ecotoxicol Environ Saf. 2019 Sep 15;179:71-78. doi: 10.1016/j.ecoenv.2019.04.035. Epub 2019 Apr 23.

A Toxicokinetic Framework and Analysis Tool for Interpreting Organisation for Economic Co-operation and Development Guideline 305 Dietary Bioaccumulation Tests.经济合作与发展组织 305 号膳食生物蓄积试验的毒代动力学框架和分析工具

Environ Toxicol Chem. 2020 Jan;39(1):171-188. doi: 10.1002/etc.4599. Epub 2019 Nov 30.

Predictive modeling and interpretability analysis of bioconcentration factors for organic chemicals in fish using machine learning.

Environ Pollut. 2025 Jul 15;377:126323. doi: 10.1016/j.envpol.2025.126323. Epub 2025 May 8.

Machine learning-based q-RASAR predictions of the bioconcentration factor of organic molecules estimated following the organisation for economic co-operation and development guideline 305.基于机器学习的 q-RASAR 预测有机分子的生物浓缩因子，该预测方法是按照经济合作与发展组织的指南 305 进行估算的。

J Hazard Mater. 2024 Nov 5;479:135725. doi: 10.1016/j.jhazmat.2024.135725. Epub 2024 Sep 3.

BCDPi: An interpretable multitask deep neural network model for predicting chemical bioconcentration in fish.BCDPi：一种用于预测鱼类化学生物富集的可解释多任务深度神经网络模型。

Environ Res. 2025 Jan 1;264(Pt 2):120356. doi: 10.1016/j.envres.2024.120356. Epub 2024 Nov 15.

Metabolic biotransformation half-lives in fish: QSAR modeling and consensus analysis.鱼类代谢生物转化半衰期：定量构效关系建模与共识分析。

Sci Total Environ. 2014 Feb 1;470-471:1040-6. doi: 10.1016/j.scitotenv.2013.10.068. Epub 2013 Nov 14.

An innovative machine learning approach for slope stability prediction by combining shap interpretability and stacking ensemble learning.一种结合SHAP可解释性和堆叠集成学习的用于边坡稳定性预测的创新机器学习方法。

Environ Sci Pollut Res Int. 2025 May;32(21):12827-12843. doi: 10.1007/s11356-025-36406-3. Epub 2025 May 7.

Interpretable lung cancer risk prediction using ensemble learning and XAI based on lifestyle and demographic data.基于生活方式和人口统计学数据，使用集成学习和可解释人工智能进行可解释的肺癌风险预测。

Comput Biol Chem. 2025 Aug;117:108438. doi: 10.1016/j.compbiolchem.2025.108438. Epub 2025 Mar 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

构建用于预测鱼类中有机化学品生物累积参数的可解释集成学习模型。

Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献