Ulrich Nadin, Goss Kai-Uwe, Ebert Andrea
Department of Analytical Environmental Chemistry, Helmholtz Centre for Environmental Research-UFZ, Leipzig, Germany.
Institute of Chemistry, University of Halle-Wittenberg, Halle, Germany.
Commun Chem. 2021 Jun 14;4(1):90. doi: 10.1038/s42004-021-00528-9.
Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.
如今,越来越多的数据可以免费获取。基于这些大数据集,深度神经网络(DNN)在计算化学中迅速变得重要起来。在此,我们探索DNN从化学结构预测化学性质的潜力。我们选择了正辛醇 - 水分配系数(log P)作为示例,它在环境化学、毒理学以及化学分析中都起着至关重要的作用。所开发的DNN的预测性能良好,在测试数据集中的均方根误差(rmse)为0.47 log单位,对于来自SAMPL6挑战的外部数据集,rmse为0.33。为此,我们在训练DNN时使用了数据增强,考虑了化学物质的所有潜在互变异构形式。我们进一步展示了DNN模型如何通过识别潜在错误来帮助整理log P数据集,并解决数据集本身的局限性。