使用深度学习技术和数据增强探索正辛醇-水分配系数数据集。

Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation.

作者信息

Ulrich Nadin, Goss Kai-Uwe, Ebert Andrea

机构信息

Department of Analytical Environmental Chemistry, Helmholtz Centre for Environmental Research-UFZ, Leipzig, Germany.

Institute of Chemistry, University of Halle-Wittenberg, Halle, Germany.

出版信息

Commun Chem. 2021 Jun 14;4(1):90. doi: 10.1038/s42004-021-00528-9.

DOI:10.1038/s42004-021-00528-9

PMID:36697535

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9814212/

Abstract

Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.

摘要

如今，越来越多的数据可以免费获取。基于这些大数据集，深度神经网络（DNN）在计算化学中迅速变得重要起来。在此，我们探索DNN从化学结构预测化学性质的潜力。我们选择了正辛醇 - 水分配系数（log P）作为示例，它在环境化学、毒理学以及化学分析中都起着至关重要的作用。所开发的DNN的预测性能良好，在测试数据集中的均方根误差（rmse）为0.47 log单位，对于来自SAMPL6挑战的外部数据集，rmse为0.33。为此，我们在训练DNN时使用了数据增强，考虑了化学物质的所有潜在互变异构形式。我们进一步展示了DNN模型如何通过识别潜在错误来帮助整理log P数据集，并解决数据集本身的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf0e/9814212/984580fd8def/42004_2021_528_Fig1_HTML.jpg

相似文献

Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation.使用深度学习技术和数据增强探索正辛醇-水分配系数数据集。

Commun Chem. 2021 Jun 14;4(1):90. doi: 10.1038/s42004-021-00528-9.

Octanol-water partition coefficient measurements for the SAMPL6 blind prediction challenge.辛醇-水分配系数测量 SAMPL6 盲测挑战。

J Comput Aided Mol Des. 2020 Apr;34(4):405-420. doi: 10.1007/s10822-019-00271-3. Epub 2019 Dec 19.

A deep learning approach for the blind logP prediction in SAMPL6 challenge.一种用于 SAMPL6 挑战赛中盲 logP 预测的深度学习方法。

J Comput Aided Mol Des. 2020 May;34(5):535-542. doi: 10.1007/s10822-020-00292-3. Epub 2020 Jan 30.

The SAMPL6 challenge on predicting octanol-water partition coefficients from EC-RISM theory.SAMPL6 挑战赛：从 EC-RISM 理论预测辛醇-水分配系数。

J Comput Aided Mol Des. 2020 Apr;34(4):453-461. doi: 10.1007/s10822-020-00283-4. Epub 2020 Jan 24.

Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric.基于基本化学结构和替代度量指标，使用深度神经网络（DNN）预测有机化合物的溶质描述符。

Environ Sci Technol. 2022 Feb 1;56(3):2054-2064. doi: 10.1021/acs.est.1c05398. Epub 2022 Jan 7.

Predicting octanol/water partition coefficients for the SAMPL6 challenge using the SM12, SM8, and SMD solvation models.使用 SM12、SM8 和 SMD 溶剂化模型预测 SAMPL6 挑战赛中的辛醇/水分配系数。

J Comput Aided Mol Des. 2020 May;34(5):575-588. doi: 10.1007/s10822-020-00293-2. Epub 2020 Jan 30.

Prediction of the n-octanol/water partition coefficients in the SAMPL6 blind challenge from MST continuum solvation calculations.利用 MST 连续溶剂化计算预测 SAMPL6 盲测挑战中的正辛醇/水分配系数。

J Comput Aided Mol Des. 2020 Apr;34(4):443-451. doi: 10.1007/s10822-019-00262-4. Epub 2019 Nov 27.

Quantum chemical predictions of water-octanol partition coefficients applied to the SAMPL6 logP blind challenge.量子化学预测的水-辛醇分配系数应用于 SAMPL6 logP 盲测挑战。

J Comput Aided Mol Des. 2020 May;34(5):485-493. doi: 10.1007/s10822-020-00286-1. Epub 2020 Jan 30.

Use of molecular dynamics fingerprints (MDFPs) in SAMPL6 octanol-water log P blind challenge.利用分子动力学指纹（MDFPs）在 SAMPL6 辛醇-水分配系数盲测挑战中的应用。

J Comput Aided Mol Des. 2020 Apr;34(4):393-403. doi: 10.1007/s10822-019-00252-6. Epub 2019 Nov 19.

Enabling data-limited chemical bioactivity predictions through deep neural network transfer learning.通过深度神经网络迁移学习实现数据受限的化学生物活性预测。

J Comput Aided Mol Des. 2022 Dec;36(12):867-878. doi: 10.1007/s10822-022-00486-x. Epub 2022 Oct 22.

引用本文的文献

Target-aware 3D molecular generation based on guided equivariant diffusion.基于引导等变扩散的目标感知三维分子生成

Nat Commun. 2025 Aug 25;16(1):7928. doi: 10.1038/s41467-025-63245-0.

Prediction of Melting Points of Chemicals with a Data Augmentation-Based Neural Network Approach.基于数据增强的神经网络方法预测化学品的熔点

ACS Omega. 2025 Jun 3;10(23):24296-24306. doi: 10.1021/acsomega.5c00205. eCollection 2025 Jun 17.

Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset.基于图卷积神经网络在高度精选数据集上对水溶性进行预测。

J Cheminform. 2025 Apr 21;17(1):55. doi: 10.1186/s13321-025-01000-9.

Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models.使用人工智能进行质谱肽特性预测：最新模型介绍

Proteomics. 2025 May;25(9-10):e202400398. doi: 10.1002/pmic.202400398. Epub 2025 Apr 10.

Predicting Distribution Coefficients (LogD) of Cyclic Peptides Using Molecular Dynamics Simulations.使用分子动力学模拟预测环肽的分配系数（LogD）

Pharm Res. 2025 Apr;42(4):613-622. doi: 10.1007/s11095-025-03850-2. Epub 2025 Mar 26.

Predicting Toxicity toward Nitrifiers by Attention-Enhanced Graph Neural Networks and Transfer Learning from Baseline Toxicity.通过注意力增强图神经网络和基于基线毒性的迁移学习预测对硝化菌的毒性

Environ Sci Technol. 2025 Mar 11;59(9):4518-4529. doi: 10.1021/acs.est.4c12247. Epub 2025 Feb 27.

Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach.使用图神经网络方法对不对称催化文献进行数据检查

Molecules. 2025 Jan 16;30(2):355. doi: 10.3390/molecules30020355.

Recent Advances on Starch-Based Adsorbents for Heavy Metal and Emerging Pollutant Remediation.基于淀粉的重金属及新兴污染物修复吸附剂的最新进展

Polymers (Basel). 2024 Dec 25;17(1):15. doi: 10.3390/polym17010015.

FormulationBCS: A Machine Learning Platform Based on Diverse Molecular Representations for Biopharmaceutical Classification System (BCS) Class Prediction.FormulationBCS：一种基于多种分子表征的机器学习平台，用于生物药剂分类系统（BCS）类别预测。

Mol Pharm. 2025 Jan 6;22(1):330-342. doi: 10.1021/acs.molpharmaceut.4c00946. Epub 2024 Dec 8.

Serotype switching in Pseudomonas aeruginosa ST111 enhances adhesion and virulence.铜绿假单胞菌ST111中的血清型转换增强了黏附力和毒力。

PLoS Pathog. 2024 Dec 2;20(12):e1012221. doi: 10.1371/journal.ppat.1012221. eCollection 2024 Dec.

本文引用的文献

COVER: conformational oversampling as data augmentation for molecules.封面：作为分子数据增强的构象过采样

J Cheminform. 2020 Mar 18;12(1):18. doi: 10.1186/s13321-020-00420-z.

Yolk Sac of Zebrafish Embryos as Backpack for Chemicals?斑马鱼胚胎的卵黄囊作为化学物质的“背包”？

Environ Sci Technol. 2020 Aug 18;54(16):10159-10169. doi: 10.1021/acs.est.0c02068. Epub 2020 Jul 27.

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2.致力于 Chemoinformatics 中包括 InChI V2 在内的互变异构现象的全面处理。

J Chem Inf Model. 2020 Mar 23;60(3):1253-1275. doi: 10.1021/acs.jcim.9b01080. Epub 2020 Mar 10.

Ranking environmental degradation trends of plastic marine debris based on physical properties and molecular structure.基于物理性质和分子结构对塑料海洋垃圾的环境降解趋势进行排名。

Nat Commun. 2020 Feb 5;11(1):727. doi: 10.1038/s41467-020-14538-z.

A deep learning approach for the blind logP prediction in SAMPL6 challenge.一种用于 SAMPL6 挑战赛中盲 logP 预测的深度学习方法。

J Comput Aided Mol Des. 2020 May;34(5):535-542. doi: 10.1007/s10822-020-00292-3. Epub 2020 Jan 30.

Tracking complex mixtures of chemicals in our changing environment.追踪我们不断变化的环境中复杂的化学混合物。

Science. 2020 Jan 24;367(6476):388-392. doi: 10.1126/science.aay6636.

A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge.亲脂性定量构效关系的分子描述符比较与 SAMPL6 logP 预测挑战的结果。

J Comput Aided Mol Des. 2020 May;34(5):523-534. doi: 10.1007/s10822-020-00279-0. Epub 2020 Jan 13.

The METLIN small molecule dataset for machine learning-based retention time prediction.基于机器学习的保留时间预测的 METLIN 小分子数据集。

Nat Commun. 2019 Dec 20;10(1):5811. doi: 10.1038/s41467-019-13680-7.

COSMO-RS based predictions for the SAMPL6 logP challenge.基于 COSMO-RS 的 SAMPL6 logP 挑战预测。

J Comput Aided Mol Des. 2020 Apr;34(4):385-392. doi: 10.1007/s10822-019-00259-z. Epub 2019 Nov 26.

Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions.用深度神经网络统一机器学习和量子化学以获得分子波函数。

Nat Commun. 2019 Nov 15;10(1):5024. doi: 10.1038/s41467-019-12875-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用深度学习技术和数据增强探索正辛醇-水分配系数数据集。

Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献