比较使用公共数据集和专有数据集训练的 logP 和 logD 校正模型。

Comparison of logP and logD correction models trained with public and proprietary data sets.

机构信息

Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.

出版信息

J Comput Aided Mol Des. 2022 Mar;36(3):253-262. doi: 10.1007/s10822-022-00450-9. Epub 2022 Apr 1.

Abstract

In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.

摘要

在药物发现中，分配系数和分布系数（logP 和 logD，用于正辛醇/水）被广泛用作分子亲脂性的指标，而亲脂性又强烈影响潜在药物的生物活性和生物利用度。有多种已建立的方法，主要是基于片段或原子的方法来计算 logP，而 logD 的预测通常依赖于计算的 logP 和 pKa 来估计给定 pH 下中性和离子化群体的比例。ClogP 等算法通常会导致化学相关分子的系统误差，而 pKa 的估计通常更困难，因为可电离部分存在电子、诱导和共轭效应的相互作用。我们提出了一种集成的机器学习 QSAR 建模方法，通过使用商业软件预测的 ClogP 和 pKa 作为模型描述符来训练模型，从而预测 logD。通过优化软件计算的 ClogD 的损失函数，我们构建了一个校正模型，该模型包含了软件中的描述符和可用的实验 logD 数据。此外，我们还使用软件预测的 pKa 从 logD 模型中计算 logP。在这里，我们使用公开或商业上可用的 logD 数据来训练模型，以证明这种方法可以提高商业软件对亲脂性的预测。当应用于其他 logD 数据集时，这种方法扩展了商业软件的 logD 和 logP 预测的适用范围。这些模型的性能与使用更大的一组专有 logD 数据构建的模型相比具有优势。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

比较使用公共数据集和专有数据集训练的 logP 和 logD 校正模型。

Comparison of logP and logD correction models trained with public and proprietary data sets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

比较使用公共数据集和专有数据集训练的 logP 和 logD 校正模型。

Comparison of logP and logD correction models trained with public and proprietary data sets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献