Dong Xuanzhi, Xu Zhenpeng, Zhao Hongxia, Wu Di, Qu Baocheng, Liu Siyu, Xiao Bing
Key Laboratory of Facility Fisheries (Ministry of Education), School of Marine Science, Technology and Environment, Dalian Ocean University, Dalian, 116024, China.
Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China.
Environ Pollut. 2025 Jul 15;377:126323. doi: 10.1016/j.envpol.2025.126323. Epub 2025 May 8.
Chemicals are misused and released into the environment, causing adverse effects on people and ecosystems. Assessing the potential environmental risks of these chemicals before their use is crucial. The bioconcentration factor (BCF) is a key parameter used to describe the extent of chemical bioaccumulation. However, previous experiments to determine BCF values are often time-consuming and costly. In this study, a machine learning (ML) model was developed to predict BCF values using molecular descriptors and 9 algorithms. The random forest (RF) model demonstrated strong predictive performance, achieving R and R values of 0.949 and 0.935. Moreover, it required only 10 easily obtainable features. The Tanimoto similarity coefficient based on molecular structure was used to characterize the applicability domain (AD). We employed SHAP method, which identified primary factors, including hydrophobicity, molecular volume and shape, polarizability and lipophilicity, that have significantly affected BCF values. Furthermore, partial dependence plots (PDP) and 2D interaction were utilized to delve deeper into the relationship between feature values and model predictions. Results showed that MollogP>4.5, SM1_Dzv>0, SM1_Dzp>0, and ZM1C1>35 were linked to higher lgBCF values (3.2 L/kg), indicating stronger bioconcentration potential. Conversely, under other conditions that suggested weaker bioconcentration capacities, the focus should move to environmental migration. The study provided valuable insights into the factors that influence the bioaccumulation of chemicals, while the RF models can be an effective tool for assessing the bioconcentration potential of chemicals.
化学物质被滥用并释放到环境中,对人类和生态系统造成不利影响。在使用这些化学物质之前评估其潜在的环境风险至关重要。生物富集因子(BCF)是用于描述化学物质生物累积程度的关键参数。然而,先前确定BCF值的实验通常既耗时又昂贵。在本研究中,开发了一种机器学习(ML)模型,使用分子描述符和9种算法来预测BCF值。随机森林(RF)模型表现出强大的预测性能,R和R值分别达到0.949和0.935。此外,它只需要10个易于获得的特征。基于分子结构的Tanimoto相似系数用于表征适用域(AD)。我们采用SHAP方法,确定了影响BCF值的主要因素,包括疏水性、分子体积和形状、极化率和亲脂性。此外,利用偏依赖图(PDP)和二维相互作用来更深入地探究特征值与模型预测之间的关系。结果表明,MollogP>4.5、SM1_Dzv>0、SM1_Dzp>0和ZM1C1>35与较高的lgBCF值(3.2 L/kg)相关,表明生物富集潜力更强。相反,在其他表明生物富集能力较弱的条件下,应将重点转向环境迁移。该研究为影响化学物质生物累积的因素提供了有价值的见解,而RF模型可成为评估化学物质生物富集潜力的有效工具。