利用深度学习生成性质匹配的诱饵分子。

Generating property-matched decoy molecules using deep learning.

作者信息

Imrie Fergus, Bradley Anthony R, Deane Charlotte M

机构信息

Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.

Exscientia Ltd, The Schröđinger Building, Oxford Science Park, Oxford OX4 4GE, UK.

出版信息

Bioinformatics. 2021 Aug 9;37(15):2134-2141. doi: 10.1093/bioinformatics/btab080.

DOI:10.1093/bioinformatics/btab080

PMID:33532838

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8352508/

Abstract

MOTIVATION

An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development.

RESULTS

We have developed a deep learning method (DeepCoy) that generates decoys to a user's preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules' physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63.

AVAILABILITY AND IMPLEMENTATION

The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

虚拟筛选方法开发中的一个关键步骤是使用已建立的活性化合物和诱饵化合物集进行基准测试和训练。然而，常用集中的诱饵分子存在偏差，这意味着方法常常利用这些偏差来区分活性化合物和诱饵化合物，而不一定学会进行分子识别。这个基本问题阻碍了方法的泛化，也妨碍了虚拟筛选方法的开发。

结果

我们开发了一种深度学习方法（DeepCoy），该方法可以根据用户的偏好生成诱饵化合物，以消除此类偏差或构建具有特定偏差的化合物集。我们使用两个已建立的基准DUD-E和DEKOIS 2.0对DeepCoy进行了验证。对于所有102个DUD-E靶点以及DEKOIS 2.0的81个靶点中的80个，我们生成的诱饵分子与活性分子的物理化学性质更匹配，同时没有引入明显的假阴性额外风险。DeepCoy诱饵分别将最优嵌入偏差（DOE）分数平均提高了81%和66%，DUD-E从0.166降至0.032，DEKOIS 2.0从0.109降至0.038。此外，通过与Autodock Vina对接，生成的诱饵比原始诱饵分子更难区分，虚拟筛选性能的曲线下面积（AUC ROC）从0.70降至0.63。

可用性和实现方式

代码可在https://github.com/oxpig/DeepCoy获取。生成的分子可从http://opig.stats.ox.ac.uk/resources下载。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c92/8352508/28680a65f6ca/btab080f1.jpg

相似文献

Generating property-matched decoy molecules using deep learning.利用深度学习生成性质匹配的诱饵分子。

Bioinformatics. 2021 Aug 9;37(15):2134-2141. doi: 10.1093/bioinformatics/btab080.

RADER: a RApid DEcoy Retriever to facilitate decoy based assessment of virtual screening.RADER：一种快速诱饵检索器，用于促进基于诱饵的虚拟筛选评估。

Bioinformatics. 2017 Apr 15;33(8):1235-1237. doi: 10.1093/bioinformatics/btw783.

Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance.四种常用虚拟筛选程序的基准测试：活性/诱饵数据集的构建仍然是衡量性能的主要决定因素。

J Cheminform. 2016 Oct 17;8:56. doi: 10.1186/s13321-016-0167-x. eCollection 2016.

Evaluation and optimization of virtual screening workflows with DEKOIS 2.0--a public library of challenging docking benchmark sets.利用 DEKOIS 2.0 评估和优化虚拟筛选工作流程——具有挑战性对接基准集的公共库。

J Chem Inf Model. 2013 Jun 24;53(6):1447-62. doi: 10.1021/ci400115b. Epub 2013 Jun 12.

Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening.DUD-E 数据集的隐藏偏差导致基于结构的虚拟筛选中深度学习的性能产生误导。

PLoS One. 2019 Aug 20;14(8):e0220113. doi: 10.1371/journal.pone.0220113. eCollection 2019.

Property-Unmatched Decoys in Docking Benchmarks.对接基准测试中的属性不匹配诱饵。

J Chem Inf Model. 2021 Feb 22;61(2):699-714. doi: 10.1021/acs.jcim.0c00598. Epub 2021 Jan 25.

Ligity: A Non-Superpositional, Knowledge-Based Approach to Virtual Screening. Ligity：一种非叠加的、基于知识的虚拟筛选方法。

J Chem Inf Model. 2019 Jun 24;59(6):2600-2616. doi: 10.1021/acs.jcim.8b00779. Epub 2019 Jun 4.

Virtual decoy sets for molecular docking benchmarks.分子对接基准的虚拟诱饵集。

J Chem Inf Model. 2011 Feb 28;51(2):196-202. doi: 10.1021/ci100374f. Epub 2011 Jan 5.

DEKOIS: demanding evaluation kits for objective in silico screening--a versatile tool for benchmarking docking programs and scoring functions.DEKOIS：用于客观虚拟筛选的要求苛刻的评估工具包——用于基准测试对接程序和评分函数的通用工具。

J Chem Inf Model. 2011 Oct 24;51(10):2650-65. doi: 10.1021/ci2001549. Epub 2011 Aug 18.

DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets.DecoyFinder：一个易于使用的用于构建靶标特异性诱饵集的 Python GUI 应用程序。

Bioinformatics. 2012 Jun 15;28(12):1661-2. doi: 10.1093/bioinformatics/bts249. Epub 2012 Apr 26.

引用本文的文献

Graph convolutional neural networks improved target-specific scoring functions for cGAS and kRAS in virtual screening.图卷积神经网络改进了用于虚拟筛选中cGAS和kRAS的靶点特异性评分函数。

Comput Struct Biotechnol J. 2025 May 23;27:2176-2185. doi: 10.1016/j.csbj.2025.05.023. eCollection 2025.

SPLIF-Enhanced Attention-Driven 3D CNNs for Precise and Reliable Protein-Ligand Interaction Modeling for METTL3.用于METTL3精确可靠蛋白质-配体相互作用建模的基于SPLIF增强注意力驱动的3D卷积神经网络

ACS Omega. 2025 Apr 16;10(16):16748-16761. doi: 10.1021/acsomega.5c00538. eCollection 2025 Apr 29.

Development of an Integrated Computational Pipeline for PARP-1 Inhibitor Screening Using Hybrid Virtual Screening and Molecular Dynamics Simulations.使用混合虚拟筛选和分子动力学模拟开发用于PARP-1抑制剂筛选的综合计算流程

ChemistryOpen. 2025 Aug;14(8):e202500021. doi: 10.1002/open.202500021. Epub 2025 Apr 28.

Discovery of Novel DDR1 Inhibitors through a Hybrid Virtual Screening Pipeline, Biological Evaluation and Molecular Dynamics Simulations.通过混合虚拟筛选流程、生物学评估和分子动力学模拟发现新型DDR1抑制剂

ACS Med Chem Lett. 2025 Mar 17;16(4):602-610. doi: 10.1021/acsmedchemlett.4c00634. eCollection 2025 Apr 10.

InertDB as a generative AI-expanded resource of biologically inactive small molecules from PubChem.InertDB作为一种通过生成式人工智能扩展的来自PubChem的生物无活性小分子资源。

J Cheminform. 2025 Apr 10;17(1):49. doi: 10.1186/s13321-025-00999-1.

Machine Learning-Driven Discovery of Structurally Related Natural Products as Activators of the Cardiac Calcium Pump SERCA2a.机器学习驱动发现结构相关的天然产物作为心脏钙泵SERCA2a的激活剂。

ChemMedChem. 2025 May 5;20(9):e202400913. doi: 10.1002/cmdc.202400913. Epub 2025 Feb 6.

Discovery of Vascular Endothelial Growth Factor Receptor 2 Inhibitors Employing Junction Tree Variational Autoencoder with Bayesian Optimization and Gradient Ascent.利用联合树变分自编码器结合贝叶斯优化和梯度上升发现血管内皮生长因子受体2抑制剂

ACS Omega. 2024 Nov 12;9(47):47180-47193. doi: 10.1021/acsomega.4c07689. eCollection 2024 Nov 26.

Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1.通过靶点评估、深度学习和自动化实验室加速命中靶点识别：在 IRAK1 中的前瞻性验证

J Cheminform. 2024 Nov 14;16(1):127. doi: 10.1186/s13321-024-00914-0.

In Silico Exploration of Novel EGFR Kinase Mutant-Selective Inhibitors Using a Hybrid Computational Approach.使用混合计算方法对新型表皮生长因子受体激酶突变体选择性抑制剂进行计算机模拟探索

Pharmaceuticals (Basel). 2024 Aug 23;17(9):1107. doi: 10.3390/ph17091107.

Innovative virtual screening of PD-L1 inhibitors: the synergy of molecular similarity, neural networks and GNINA docking.PD-L1 抑制剂的创新虚拟筛选：分子相似性、神经网络和 GNINA 对接的协同作用。

Future Med Chem. 2024;16(20):2107-2118. doi: 10.1080/17568919.2024.2389773. Epub 2024 Sep 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用深度学习生成性质匹配的诱饵分子。

Generating property-matched decoy molecules using deep learning.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献