通过检查用于实验测试的化合物选择中的偏差来提高 (QSAR) 预测的准确性。

Improving (Q)SAR predictions by examining bias in the selection of compounds for experimental testing.

机构信息

Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia.

Department of Bioinformatics, Medical-Biological Department, Pirogov Russian National Research Medical University , Moscow , Russia.

出版信息

SAR QSAR Environ Res. 2019 Oct;30(10):759-773. doi: 10.1080/1062936X.2019.1665580. Epub 2019 Sep 24.

DOI:10.1080/1062936X.2019.1665580

PMID:31547686

Abstract

Existing data on structures and biological activities are limited and distributed unevenly across distinct molecular targets and chemical compounds. The question arises if these data represent an unbiased sample of the general population of chemical-biological interactions. To answer this question, we analyzed ChEMBL data for 87,583 molecules tested against 919 protein targets using supervised and unsupervised approaches. Hierarchical clustering of the Murcko frameworks generated using Chemistry Development Toolkit showed that the available data form a big diffuse cloud without apparent structure. In contrast hereto, PASS-based classifiers allowed prediction whether the compound had been tested against the particular molecular target, despite whether it was active or not. Thus, one may conclude that the selection of chemical compounds for testing against specific targets is biased, probably due to the influence of prior knowledge. We assessed the possibility to improve (Q)SAR predictions using this fact: PASS prediction of the interaction with the particular target for compounds predicted as tested against the target has significantly higher accuracy than for those predicted as untested (average ROC AUC are about 0.87 and 0.75, respectively). Thus, considering the existing bias in the data of the training set may increase the performance of virtual screening.

摘要

现有的结构和生物活性数据有限，且在不同的分子靶标和化学化合物之间分布不均。问题是这些数据是否代表了化学-生物相互作用总体人群的无偏样本。为了回答这个问题，我们使用监督和无监督的方法分析了 ChEMBL 数据，这些数据涉及针对 919 个蛋白质靶标测试的 87,583 种分子。使用 Chemistry Development Toolkit 生成的 Murcko 框架的层次聚类表明，可用数据形成了一个没有明显结构的大弥散云。与此相反，基于 PASS 的分类器允许预测化合物是否针对特定的分子靶标进行了测试，无论它是否具有活性。因此，可以得出结论，针对特定靶标测试的化合物选择存在偏差，这可能是由于先验知识的影响。我们评估了利用这一事实来改进（QSAR）预测的可能性：对于预测为针对该靶标进行测试的化合物，PASS 对与特定靶标相互作用的预测具有明显更高的准确性，而对于预测为未测试的化合物则准确性较低（平均 ROC AUC 分别约为 0.87 和 0.75）。因此，考虑到训练集中数据存在的偏差，可能会提高虚拟筛选的性能。

相似文献

Improving (Q)SAR predictions by examining bias in the selection of compounds for experimental testing.通过检查用于实验测试的化合物选择中的偏差来提高 (QSAR) 预测的准确性。

SAR QSAR Environ Res. 2019 Oct;30(10):759-773. doi: 10.1080/1062936X.2019.1665580. Epub 2019 Sep 24.

PASS Targets: Ligand-based multi-target computational system based on a public data and naïve Bayes approach.PASS靶点：基于公共数据和朴素贝叶斯方法的基于配体的多靶点计算系统。

SAR QSAR Environ Res. 2015;26(10):783-93. doi: 10.1080/1062936X.2015.1078407. Epub 2015 Aug 25.

(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds.基于药物类似物的 HIV-1 蛋白抑制的定量构效关系模型。

Molecules. 2019 Dec 25;25(1):87. doi: 10.3390/molecules25010087.

How to Achieve Better Results Using PASS-Based Virtual Screening: Case Study for Kinase Inhibitors.如何通过基于PASS的虚拟筛选获得更好的结果：激酶抑制剂的案例研究

Front Chem. 2018 Apr 26;6:133. doi: 10.3389/fchem.2018.00133. eCollection 2018.

Ligand-based virtual screening and in silico design of new antimalarial compounds using nonstochastic and stochastic total and atom-type quadratic maps.基于配体的虚拟筛选以及使用非随机和随机全原子型及原子类型二次映射的新型抗疟化合物的计算机辅助设计。

J Chem Inf Model. 2005 Jul-Aug;45(4):1082-100. doi: 10.1021/ci050085t.

In silico target prediction for elucidating the mode of action of herbicides including prospective validation.用于阐明除草剂作用模式的计算机辅助靶点预测，包括前瞻性验证。

J Mol Graph Model. 2017 Jan;71:70-79. doi: 10.1016/j.jmgm.2016.10.021. Epub 2016 Nov 6.

Chemotography for multi-target SAR analysis in the context of biological pathways.基于生物通路的多靶标 SAR 分析的化学探测法。

Bioorg Med Chem. 2012 Sep 15;20(18):5416-27. doi: 10.1016/j.bmc.2012.02.034. Epub 2012 Feb 20.

Role of moving average analysis for development of multi-target (Q)SAR models.移动平均分析在多靶点（定量）构效关系模型开发中的作用。

Mini Rev Med Chem. 2015;15(8):659-76. doi: 10.2174/1389557515666150219130554.

Neighborhood-based prediction of novel active compounds from SAR matrices.基于邻域的从SAR矩阵预测新型活性化合物。

J Chem Inf Model. 2014 Mar 24;54(3):801-9. doi: 10.1021/ci5000483. Epub 2014 Mar 12.

Virtual screening of chemical compounds active against breast cancer cell lines based on cell cycle modelling, prediction of cytotoxicity and interaction with targets.基于细胞周期建模、细胞毒性预测以及与靶点相互作用对乳腺癌细胞系有活性的化合物进行虚拟筛选。

SAR QSAR Environ Res. 2015;26(7-9):595-604. doi: 10.1080/1062936X.2015.1076516. Epub 2015 Sep 11.

引用本文的文献

Discovery of a Novel Non-Narcotic Analgesic Derived from the CL-20 Explosive: Synthesis, Pharmacology, and Target Identification of Thiowurtzine, a Potent Inhibitor of the Opioid Receptors and the Ion Channels.从CL-20炸药中发现一种新型非麻醉性镇痛药：硫代乌嗪的合成、药理学及靶点鉴定，硫代乌嗪是阿片受体和离子通道的强效抑制剂

ACS Omega. 2021 May 31;6(23):15400-15411. doi: 10.1021/acsomega.1c01786. eCollection 2021 Jun 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过检查用于实验测试的化合物选择中的偏差来提高 (QSAR) 预测的准确性。

Improving (Q)SAR predictions by examining bias in the selection of compounds for experimental testing.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献