Zhu Tengyi, Zhang Yu, Li Yi, Tao Tianyun, Tao Cuicui
School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
J Hazard Mater. 2023 Oct 5;459:132320. doi: 10.1016/j.jhazmat.2023.132320. Epub 2023 Aug 18.
Root concentration factor (RCF) is a significant parameter to characterize uptake and accumulation of hazardous organic contaminants (HOCs) by plant roots. However, complex interactions among chemicals, plant roots and soil make it challenging to identify underlying mechanisms of uptake and accumulation of HOCs. Here, nine machine learning techniques were applied to investigate major factors controlling RCF based on variable combinations of molecular descriptors (MD), MACCS fingerprints, quantum chemistry descriptors (QCD) and three physicochemical properties related to chemical-soil-plant system. Compared to models with variables including MACCS fingerprints or solitary physicochemical properties, the XGBoost-6 model developed by the variable combination of MD, QCD and three physicochemical properties achieved the most remarkable performance, with R of 0.977. Model interpretation achieved by permutation variable importance and partial dependence plots revealed the vital importance of HOCs lipophilicity, lipid content of plant roots, soil organic matter content, the overall deformability and the molecular dispersive ability of HOCs for regulating RCF. The integration of MD and QCD with physicochemical properties could improve our knowledge of underlying mechanisms regarding HOCs accumulation in plant roots from innovative structural perspectives. Multiple variables combination-oriented performance improvement of model can be extended to other parameters prediction in environmental risk assessment field.
根富集因子(RCF)是表征植物根系对有害有机污染物(HOCs)吸收和积累的一个重要参数。然而,化学物质、植物根系和土壤之间复杂的相互作用使得确定HOCs吸收和积累的潜在机制具有挑战性。在此,基于分子描述符(MD)、MACCS指纹、量子化学描述符(QCD)以及与化学-土壤-植物系统相关的三种物理化学性质的变量组合,应用九种机器学习技术来研究控制RCF的主要因素。与包含MACCS指纹或单一物理化学性质变量的模型相比,由MD、QCD和三种物理化学性质的变量组合开发的XGBoost-6模型表现最为出色,R值为0.977。通过排列变量重要性和偏依赖图实现的模型解释揭示了HOCs亲脂性、植物根系脂质含量、土壤有机质含量、HOCs的整体变形能力和分子分散能力对调节RCF的至关重要性。将MD和QCD与物理化学性质相结合,可以从创新的结构角度提高我们对HOCs在植物根系中积累的潜在机制的认识。面向多变量组合的模型性能提升可扩展到环境风险评估领域的其他参数预测。