Suppr超能文献

使用深度学习技术和数据增强探索正辛醇-水分配系数数据集。

Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation.

作者信息

Ulrich Nadin, Goss Kai-Uwe, Ebert Andrea

机构信息

Department of Analytical Environmental Chemistry, Helmholtz Centre for Environmental Research-UFZ, Leipzig, Germany.

Institute of Chemistry, University of Halle-Wittenberg, Halle, Germany.

出版信息

Commun Chem. 2021 Jun 14;4(1):90. doi: 10.1038/s42004-021-00528-9.

Abstract

Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.

摘要

如今,越来越多的数据可以免费获取。基于这些大数据集,深度神经网络(DNN)在计算化学中迅速变得重要起来。在此,我们探索DNN从化学结构预测化学性质的潜力。我们选择了正辛醇 - 水分配系数(log P)作为示例,它在环境化学、毒理学以及化学分析中都起着至关重要的作用。所开发的DNN的预测性能良好,在测试数据集中的均方根误差(rmse)为0.47 log单位,对于来自SAMPL6挑战的外部数据集,rmse为0.33。为此,我们在训练DNN时使用了数据增强,考虑了化学物质的所有潜在互变异构形式。我们进一步展示了DNN模型如何通过识别潜在错误来帮助整理log P数据集,并解决数据集本身的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf0e/9814212/984580fd8def/42004_2021_528_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验