机器学习分类可以减少基于结构的虚拟筛选中的假阳性。

Machine learning classification can reduce false positives in structure-based virtual screening.

机构信息

Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111.

Center for Computational Biology, University of Kansas, Lawrence, KS 66045.

出版信息

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18477-18488. doi: 10.1073/pnas.2000585117. Epub 2020 Jul 15.

DOI:10.1073/pnas.2000585117

PMID:32669436

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7414157/

Abstract

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC 280 nM, corresponding to of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.

摘要

随着可供筛选的文库规模的最近爆炸式增长，虚拟筛选有望在早期药物发现中寻找活性化学物质的搜索中发挥更突出的作用。然而，在典型的虚拟筛选中，只有大约 12%的得分最高的化合物在生化测定中实际显示活性。我们认为，用于此任务的大多数评分函数在训练和测试所依据的数据集上没有经过充分的思考，导致模型过于简单化和/或过度训练。由于报告新评分方法的研究没有在同一研究中前瞻性地验证其模型，这些问题在文献中更加复杂。在这里，我们报告了一种构建训练数据集（D-COID）的策略，该策略旨在生成高度引人注目的诱饵复合物，这些复合物与可用的活性复合物个体匹配。使用该数据集，我们基于 XGBoost 框架训练了一种用于虚拟筛选的通用分类器（vScreenML）。在回顾性基准测试中，我们的分类器相对于其他评分函数表现出色。在前瞻性背景下，从乙酰胆碱酯酶筛选中几乎所有候选抑制剂都显示出可检测的活性；除此之外，23 种化合物中有 10 种的 IC 优于 50 μM。没有任何药物化学优化，最有效的化合物的 IC 为 280 nM，对应于 173 nM 的。这些结果支持在其他计算生物学任务中使用 D-COID 策略来训练分类器，以及在针对其他蛋白质靶标的虚拟筛选活动中使用 vScreenML。D-COID 和 vScreenML 均免费分发，以促进此类工作。

相似文献

Machine learning classification can reduce false positives in structure-based virtual screening.机器学习分类可以减少基于结构的虚拟筛选中的假阳性。

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18477-18488. doi: 10.1073/pnas.2000585117. Epub 2020 Jul 15.

Structure-based virtual screening of vast chemical space as a starting point for drug discovery.基于结构的虚拟筛选广阔的化学空间作为药物发现的起点。

Curr Opin Struct Biol. 2024 Aug;87:102829. doi: 10.1016/j.sbi.2024.102829. Epub 2024 Jun 6.

Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models.使用机器学习模型提高人激肽释放酶5抑制剂的虚拟筛选预测准确性。

Comput Biol Chem. 2017 Aug;69:110-119. doi: 10.1016/j.compbiolchem.2017.05.007. Epub 2017 May 29.

Automated Inference of Chemical Discriminants of Biological Activity.生物活性化学判别因子的自动推断

Methods Mol Biol. 2018;1762:307-338. doi: 10.1007/978-1-4939-7756-7_16.

Novel inhibitor discovery through virtual screening against multiple protein conformations generated via ligand-directed modeling: a maternal embryonic leucine zipper kinase example.通过针对配体导向建模生成的多种蛋白质构象的虚拟筛选发现新型抑制剂：以母体胚胎亮氨酸拉链激酶为例。

J Chem Inf Model. 2012 May 25;52(5):1345-55. doi: 10.1021/ci300040c. Epub 2012 May 8.

PAIN(S) relievers for medicinal chemists: how computational methods can assist in hit evaluation.药物化学家的止痛剂：计算方法如何辅助活性筛选评估

Future Med Chem. 2018 Jul 1;10(13):1533-1535. doi: 10.4155/fmc-2018-0116. Epub 2018 Jun 29.

Integrating sampling techniques and inverse virtual screening: toward the discovery of artificial peptide-based receptors for ligands.整合采样技术与反向虚拟筛选：探索基于人工肽的配体受体

Mol Divers. 2016 May;20(2):421-38. doi: 10.1007/s11030-015-9648-5. Epub 2015 Nov 9.

Quantitative structure-activity relationship analysis and virtual screening studies for identifying HDAC2 inhibitors from known HDAC bioactive chemical libraries.从已知的组蛋白去乙酰化酶（HDAC）生物活性化学文库中鉴定HDAC2抑制剂的定量构效关系分析和虚拟筛选研究。

SAR QSAR Environ Res. 2017 Mar;28(3):199-220. doi: 10.1080/1062936X.2017.1294198. Epub 2017 Feb 28.

Machine Learning Classification Models to Improve the Docking-based Screening: A Case of PI3K-Tankyrase Inhibitors.基于对接的筛选的机器学习分类模型：以 PI3K-Tankyrase 抑制剂为例。

Mol Inform. 2018 Nov;37(11):e1800030. doi: 10.1002/minf.201800030. Epub 2018 Jun 14.

A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery.基于高通量抗菌筛选的机器学习模型提高了药物发现的命中率。

PLoS Comput Biol. 2022 Oct 13;18(10):e1010613. doi: 10.1371/journal.pcbi.1010613. eCollection 2022 Oct.

引用本文的文献

Unveiling Novel Arginase Inhibitors for Cutaneous Leishmaniasis Using Drug Repurposing and Virtual Screening Approaches.利用药物再利用和虚拟筛选方法揭示用于皮肤利什曼病的新型精氨酸酶抑制剂

J Cell Biochem. 2025 Aug;126(8):e70060. doi: 10.1002/jcb.70060.

Leveraging viral genome sequences and machine learning models for identification of potentially selective antiviral agents.利用病毒基因组序列和机器学习模型来鉴定潜在的选择性抗病毒药物。

Commun Chem. 2025 Jun 20;8(1):189. doi: 10.1038/s42004-025-01583-2.

Rag2Mol: structure-based drug design based on retrieval augmented generation.Rag2Mol：基于检索增强生成的基于结构的药物设计。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf265.

A Review of the Applications, Benefits, and Challenges of Generative AI for Sustainable Toxicology.生成式人工智能在可持续毒理学中的应用、益处及挑战综述

Curr Res Toxicol. 2025 Apr 21;8:100232. doi: 10.1016/j.crtox.2025.100232. eCollection 2025.

Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.利用化学结构进行毒性预测的机器学习：在现实世界中取得成功的支柱。

Chem Res Toxicol. 2025 May 19;38(5):759-807. doi: 10.1021/acs.chemrestox.5c00033. Epub 2025 May 2.

Machine Learning-Guided Screening and Molecular Docking for Proposing Naturally Derived Drug Candidates Against MERS-CoV 3CL Protease.机器学习引导的筛选和分子对接，用于提出抗中东呼吸综合征冠状病毒3CL蛋白酶的天然衍生候选药物。

Int J Mol Sci. 2025 Mar 26;26(7):3047. doi: 10.3390/ijms26073047.

A phenotypic drug discovery approach by latent interaction in deep learning.一种基于深度学习中潜在相互作用的表型药物发现方法。

R Soc Open Sci. 2024 Oct 23;11(10):240720. doi: 10.1098/rsos.240720. eCollection 2024 Oct.

iScore: A ML-Based Scoring Function for De Novo Drug Discovery.iScore：一种用于从头药物发现的基于机器学习的评分函数。

J Chem Inf Model. 2025 Mar 24;65(6):2759-2772. doi: 10.1021/acs.jcim.4c02192. Epub 2025 Mar 4.

Exploring machine learning algorithms for predicting fertility preferences among reproductive age women in Nigeria.探索用于预测尼日利亚育龄妇女生育偏好的机器学习算法。

Front Digit Health. 2025 Jan 16;6:1495382. doi: 10.3389/fdgth.2024.1495382. eCollection 2024.

High-Affinity Peptides for Target Protein Screened in Ultralarge Virtual Libraries.在超大型虚拟文库中筛选出的针对目标蛋白的高亲和力肽段。

ACS Cent Sci. 2024 Nov 2;10(11):2111-2118. doi: 10.1021/acscentsci.4c01385. eCollection 2024 Nov 27.

本文引用的文献

Building Machine-Learning Scoring Functions for Structure-Based Prediction of Intermolecular Binding Affinity.构建用于基于结构的分子间结合亲和力预测的机器学习评分函数。

Methods Mol Biol. 2019;2053:1-12. doi: 10.1007/978-1-4939-9752-7_1.

Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening.DUD-E 数据集的隐藏偏差导致基于结构的虚拟筛选中深度学习的性能产生误导。

PLoS One. 2019 Aug 20;14(8):e0220113. doi: 10.1371/journal.pone.0220113. eCollection 2019.

SwissTargetPrediction: updated data and new features for efficient prediction of protein targets of small molecules.瑞士靶点预测：小分子蛋白质靶标高效预测的更新数据和新特性。

Nucleic Acids Res. 2019 Jul 2;47(W1):W357-W364. doi: 10.1093/nar/gkz382.

Ultra-large library docking for discovering new chemotypes.超大库对接发现新化学型。

Nature. 2019 Feb;566(7743):224-229. doi: 10.1038/s41586-019-0917-9. Epub 2019 Feb 6.

Identification of New Potent Acetylcholinesterase Inhibitors Using Virtual Screening and in vitro Approaches.利用虚拟筛选和体外方法鉴定新型强效乙酰胆碱酯酶抑制剂。

Mol Inform. 2019 May;38(5):e1800118. doi: 10.1002/minf.201800118. Epub 2019 Feb 6.

Industrial scale high-throughput screening delivers multiple fast acting macrofilaricides.工业规模高通量筛选可提供多种速效杀微丝蚴药物。

Nat Commun. 2019 Jan 2;10(1):11. doi: 10.1038/s41467-018-07826-2.

Practical Model Selection for Prospective Virtual Screening.前瞻性虚拟筛选的实用模型选择。

J Chem Inf Model. 2019 Jan 28;59(1):282-293. doi: 10.1021/acs.jcim.8b00363. Epub 2018 Dec 18.

Learning protein binding affinity using privileged information.利用特权信息学习蛋白质结合亲和力。

BMC Bioinformatics. 2018 Nov 15;19(1):425. doi: 10.1186/s12859-018-2448-z.

DeepDTA: deep drug-target binding affinity prediction.深度 DTA：深度药物-靶标结合亲和力预测。

Bioinformatics. 2018 Sep 1;34(17):i821-i829. doi: 10.1093/bioinformatics/bty593.

Development of a protein-ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions.开发一种蛋白质配体扩展连接性（PLEC）指纹及其在结合亲和力预测中的应用。

Bioinformatics. 2019 Apr 15;35(8):1334-1341. doi: 10.1093/bioinformatics/bty757.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习分类可以减少基于结构的虚拟筛选中的假阳性。

Machine learning classification can reduce false positives in structure-based virtual screening.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献