Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States.
Environ Sci Technol. 2022 Feb 1;56(3):2054-2064. doi: 10.1021/acs.est.1c05398. Epub 2022 Jan 7.
Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log ), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol-water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log, BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed "accurate" by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) "similar" chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.
已有广泛应用溶质描述符通过多参数线性自由能关系(pp-LFER)来模拟化学传递过程;然而,对于新的有机化合物,准确快速地获取这些描述符仍然存在很大的困难。在这项研究中,构建了仅需要化学物质 SMILES 的模型(PaDEL-DNN),使用深度神经网络(DNN)和 PaDEL 化学表示来满意地估计 pp-LFER 描述符。PaDEL-DNN 估计的 pp-LFER 描述符在模拟储存脂质/水分配系数(logP)、生物浓缩系数(BCF)、水溶解度(ESOL)和水合自由能(freesolve)方面表现出良好的性能。然后,假设广泛可用性质(例如,logP(辛醇-水分配系数))的估计值的准确性可以校准较少可用但相关性质的估计值,我们提出了 logP 作为评估估计的 pp-LFER 描述符整体准确性的替代指标。当使用 pp-LFER 描述符来模拟 logP、BCF、ESOL 和 freesolve 时,对于被替代指标认为“准确”的估计 pp-LFER 描述符的化学物质,我们的模型可以实现约 0.1 log 单位的更低误差。PaDEL-DNN 模型的解释表明,对于给定的测试化学物质,在训练数据集中有几个(约 5 个)“相似”的化学物质对于准确估计至关重要,而其余较少相似的训练化学物质提供了合理的基线估计。最后,通过结合 PaDEL-DNN 和替代指标,对超过 2800 种持久性、生物累积性和毒性化学物质的 pp-LFER 描述符进行了合理估计。总的来说,PaDEL-DNN/替代指标和新估计的描述符将极大地促进化学传递模型的发展。