Pirogov Russian National Research Medical University , Ostrovitianov str. 1 , Moscow , 117997 , Russia.
Institute of Biomedical Chemistry , Pogodinskaya Str., 10/8 , Moscow , 119121 , Russia.
J Chem Inf Model. 2019 Feb 25;59(2):713-730. doi: 10.1021/acs.jcim.8b00617. Epub 2019 Feb 12.
Numerous studies have been published in recent years with acceptable quantitative structure-activity relationship (QSAR) modeling based on heterogeneous data. In many cases, the training sets for QSAR modeling were constructed from compounds tested by different biological assays, contradicting the opinion that QSAR modeling should be based on the data measured by a single protocol. We attempted to develop approaches that help to determine how heterogeneous data should be used for the creation of QSAR models on the basis of different sets of compounds tested by different experimental methods for the same target and the same endpoint. To this end, more than 100 QSAR models for the IC values of ligands interacting with cyclooxygenase 1,2 (COX) and seed lipoxygenase (LOX), obtained from ChEMBL database were created using the GUSAR software. The QSAR models were tested on the external set, including 26 new thiazolidinone derivatives, which were experimentally tested for COX-1,2/LOX inhibition. The IC values of the derivatives varied from 89 μM to 26 μM for LOX, from 200 μM to 0.018 μM for COX-1, and from 210 μM to 1 μM for COX-2. This study showed that the accuracy of the models is dependent on the distribution of IC values of low activity compounds in the training sets. In the most cases, QSAR models created based on the combined training sets had advantages in comparison with QSAR models, based on a single publication. We introduced a new method of combination of quantitative data from different experimental studies based on the data of reference compounds, which was called "scaling".
近年来,已有大量研究发表,这些研究基于异质数据进行了可接受的定量构效关系(QSAR)建模。在许多情况下,QSAR 建模的训练集是由通过不同生物测定方法测试的化合物构建的,这与 QSAR 建模应基于单一方案测量的数据这一观点相矛盾。我们试图开发一些方法,帮助确定如何在基于相同靶标和相同终点的不同实验方法测试的化合物的不同数据集的基础上,使用异质数据来创建 QSAR 模型。为此,使用 GUSAR 软件创建了 100 多个与环氧化酶 1、2(COX)和种子脂氧合酶(LOX)相互作用的配体的 IC 值的 QSAR 模型,这些模型来自 ChEMBL 数据库。在外部集上测试了 QSAR 模型,其中包括 26 种新的噻唑烷二酮衍生物,这些衍生物在实验中针对 COX-1、2/LOX 抑制进行了测试。衍生物的 IC 值对于 LOX 从 89 μM 到 26 μM 变化,对于 COX-1 从 200 μM 到 0.018 μM 变化,对于 COX-2 从 210 μM 到 1 μM 变化。这项研究表明,模型的准确性取决于训练集中低活性化合物的 IC 值分布。在大多数情况下,基于组合训练集创建的 QSAR 模型与基于单个出版物的 QSAR 模型相比具有优势。我们引入了一种基于参考化合物数据的新方法,用于组合来自不同实验研究的定量数据,称为“缩放”。