Jin Ruyue, Liang Yuzhen, Shi Zhenqing
School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.
The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.
Environ Sci Process Impacts. 2025 Jul 16;27(7):1889-1901. doi: 10.1039/d5em00029g.
This study aims to improve predictions and understanding of dissolved organic carbon-water partitioning coefficients (), a crucial parameter in environmental risk assessment. A dataset encompassing 709 datapoints across 190 unique organic pollutants and various types of dissolved organic matter (DOM) was compiled. Molecular descriptors were calculated to characterize each compound's properties and structures using Multiwfn, PaDEL and RDKit. Individual machine learning models were established for four different DOM origins: all DOM, natural aquatic DOM, natural terrestrial DOM and commercial DOM. These models exhibited excellent goodness-of-fit, internal stability, and predictive performance with > 0.771, > 0.602, > 0.629, and RMSE ranging from 0.413 to 0.580. Shapley additive explanation analysis identified CrippenLogP and MATS2m as the most influencing factors. CrippenLogP, reflecting hydrophobicity, positively influenced , while MATS2m, characterizing molecular branching and compactness, had a negative effect. Mor29m, where lower values indicate a higher abundance of heteroatoms such as halogens, also showed a negative impact, likely due to enhanced interactions with polar DOM groups. SlogP_VSA1, another descriptor related to hydrophobicity, demonstrated a positive correlation with log in natural aquatic DOM, while its negative correlation in all DOM may reflect the great diversity of DOM properties in that group. Partial dependence plots revealed that when CrippenLogP > 6, Mor29m between 0.45 and 0.52, MATS2m < -0.015, and SlogP_VSA1 < 7, organic pollutants tended to partition more into DOM. These findings support the application of machine learning models for assessing pollutant interactions with DOM, contributing to improved environmental risk predictions.
本研究旨在改进对溶解有机碳-水分配系数()的预测和理解,这是环境风险评估中的一个关键参数。编制了一个数据集,涵盖190种独特有机污染物和各种类型溶解有机物(DOM)的709个数据点。使用Multiwfn、PaDEL和RDKit计算分子描述符,以表征每种化合物的性质和结构。针对四种不同的DOM来源建立了单独的机器学习模型:所有DOM、天然水生DOM、天然陆地DOM和商业DOM。这些模型表现出优异的拟合优度、内部稳定性和预测性能,>0.771,>0.602,>0.629,均方根误差(RMSE)范围为0.413至0.580。夏普利加法解释分析确定CrippenLogP和MATS2m为最具影响的因素。反映疏水性的CrippenLogP对有正向影响,而表征分子分支和紧凑性的MATS2m有负面影响。Mor29m值较低表明卤族等杂原子丰度较高,也显示出负面影响,这可能是由于与极性DOM基团的相互作用增强所致。另一个与疏水性相关的描述符SlogP_VSA1在天然水生DOM中与log呈正相关,而在所有DOM中的负相关可能反映了该组中DOM性质的巨大差异。部分依赖图显示,当CrippenLogP>6、Mor29m在0.45至0.52之间、MATS2m<-0.015且SlogP_VSA1<7时,有机污染物倾向于更多地分配到DOM中。这些发现支持将机器学习模型应用于评估污染物与DOM的相互作用,有助于改进环境风险预测。