通过基于分子库的主动学习加速高通量虚拟筛选

Accelerating high-throughput virtual screening through molecular pool-based active learning.

作者信息

Graff David E, Shakhnovich Eugene I, Coley Connor W

机构信息

Department of Chemistry and Chemical Biology, Harvard University Cambridge MA USA.

Department of Chemical Engineering, MIT Cambridge MA USA

出版信息

Chem Sci. 2021 Apr 29;12(22):7866-7881. doi: 10.1039/d0sc06805e.

DOI:10.1039/d0sc06805e

PMID:34168840

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8188596/

Abstract

Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of 10 molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques, previously employed in other scientific discovery problems, can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we explore the application of these techniques to computational docking datasets and assess the impact of surrogate model architecture, acquisition function, and acquisition batch size on optimization performance. We observe significant reductions in computational costs; for example, using a directed-message passing neural network we can identify 94.8% or 89.3% of the top-50 000 ligands in a 100M member library after testing only 2.4% of candidate ligands using an upper confidence bound or greedy acquisition strategy, respectively. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

摘要

基于结构的虚拟筛选是早期药物发现中的一项重要工具，它对靶蛋白与候选配体之间的相互作用进行评分。随着虚拟库持续增长（超过10^9个分子），对这些库进行详尽虚拟筛选所需的资源也在增加。然而，先前用于其他科学发现问题的贝叶斯优化技术有助于对它们进行探索：基于库的一个子集的预测亲和力训练的替代结构-属性关系模型可应用于其余库成员，从而将最没有前景的化合物排除在评估之外。在本研究中，我们探索这些技术在计算对接数据集上的应用，并评估替代模型架构、采集函数和采集批次大小对优化性能的影响。我们观察到计算成本显著降低；例如，使用定向消息传递神经网络，分别采用上置信界或贪婪采集策略，在测试仅2.4%的候选配体后，我们可以在一个包含1亿个成员的库中识别出前50000个配体中的94.8%或89.3%。这种模型引导的搜索减轻了筛选越来越大的虚拟库所增加的计算成本，并可以加速高通量虚拟筛选活动，其应用范围超出对接。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f2a/8188596/86bdb9a17d89/d0sc06805e-f1.jpg

相似文献

Accelerating high-throughput virtual screening through molecular pool-based active learning.

Chem Sci. 2021 Apr 29;12(22):7866-7881. doi: 10.1039/d0sc06805e.

Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening.

J Chem Inf Model. 2023 Nov 13;63(21):6501-6514. doi: 10.1021/acs.jcim.3c01371. Epub 2023 Oct 26.

Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking.

Nat Protoc. 2022 Mar;17(3):672-697. doi: 10.1038/s41596-021-00659-2. Epub 2022 Feb 4.

Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking.

J Chem Inf Model. 2024 Apr 8;64(7):2612-2623. doi: 10.1021/acs.jcim.3c01661. Epub 2023 Dec 29.

HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries.

Mol Inform. 2024 Jan;43(1):e202300207. doi: 10.1002/minf.202300207. Epub 2023 Dec 19.

High throughput virtual screening (HTVS) of peptide library: Technological advancement in ligand discovery.

Eur J Med Chem. 2022 Dec 5;243:114766. doi: 10.1016/j.ejmech.2022.114766. Epub 2022 Sep 13.

Self-Focusing Virtual Screening with Active Design Space Pruning.

J Chem Inf Model. 2022 Aug 22;62(16):3854-3862. doi: 10.1021/acs.jcim.2c00554. Epub 2022 Aug 6.

Accelerating Molecular Docking using Machine Learning Methods.

Mol Inform. 2024 Jun;43(6):e202300167. doi: 10.1002/minf.202300167. Epub 2024 Jun 8.

AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening.

BMC Bioinformatics. 2008 Oct 16;9:438. doi: 10.1186/1471-2105-9-438.

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries.

J Chem Inf Model. 2023 Sep 25;63(18):5773-5783. doi: 10.1021/acs.jcim.3c01239. Epub 2023 Sep 1.

引用本文的文献

Optimizing drug design by merging generative AI with a physics-based active learning framework.

Commun Chem. 2025 Aug 8;8(1):238. doi: 10.1038/s42004-025-01635-7.

A bottom-up approach to find lead compounds in expansive chemical spaces.

Commun Chem. 2025 Aug 1;8(1):225. doi: 10.1038/s42004-025-01610-2.

Simulations and active learning enable efficient identification of an experimentally-validated broad coronavirus inhibitor.

Nat Commun. 2025 Jul 29;16(1):6949. doi: 10.1038/s41467-025-62139-5.

AI meets physics in computational structure-based drug discovery for GPCRs.

NPJ Drug Discov. 2025;2(1):16. doi: 10.1038/s44386-025-00019-0. Epub 2025 Jul 3.

Predicting high-fitness viral protein variants with Bayesian active learning and biophysics.

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2503742122. doi: 10.1073/pnas.2503742122. Epub 2025 Jun 9.

Multiscale analysis and optimal glioma therapeutic candidate discovery using the CANDO platform.

bioRxiv. 2025 May 23:2025.05.19.654757. doi: 10.1101/2025.05.19.654757.

Integrating Machine Learning-Based Pose Sampling with Established Scoring Functions for Virtual Screening.

J Chem Inf Model. 2025 May 26;65(10):4833-4843. doi: 10.1021/acs.jcim.5c00380. Epub 2025 May 9.

Automated On-the-Fly Optimization of Resource Allocation for Efficient Free Energy Simulations.

J Chem Inf Model. 2025 May 26;65(10):4932-4951. doi: 10.1021/acs.jcim.4c02107. Epub 2025 May 6.

Synthon-Based Strategies Exploiting Molecular Similarity and Protein-Ligand Interactions for Efficient Screening of Ultra-Large Chemical Libraries.

J Chem Inf Model. 2025 Jul 28;65(14):7569-7583. doi: 10.1021/acs.jcim.5c00222. Epub 2025 Apr 28.

Assessing the Robustness and Scalability of Machine Learning Methods to Accelerate Ultralarge High-Throughput Docking Campaigns.

ACS Omega. 2025 Apr 7;10(15):15598-15609. doi: 10.1021/acsomega.5c00829. eCollection 2025 Apr 22.

本文引用的文献

Efficient Exploration of Chemical Space with Docking and Deep Learning.

J Chem Theory Comput. 2021 Nov 9;17(11):7106-7119. doi: 10.1021/acs.jctc.1c00810. Epub 2021 Sep 30.

Autonomous intelligent agents for accelerated materials discovery.

Chem Sci. 2020 Jul 30;11(32):8517-8532. doi: 10.1039/d0sc01101k.

Machine Learning Boosted Docking (HASTEN): An Open-source Tool To Accelerate Structure-based Virtual Screening Campaigns.

Mol Inform. 2021 Sep;40(9):e2100089. doi: 10.1002/minf.202100089. Epub 2021 Jun 1.

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19.

J Chem Inf Model. 2020 Dec 28;60(12):5832-5852. doi: 10.1021/acs.jcim.0c01010. Epub 2020 Dec 16.

SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules.

Sci Data. 2020 Nov 11;7(1):384. doi: 10.1038/s41597-020-00727-4.

Evolution of Novartis' Small Molecule Screening Deck Design.

J Med Chem. 2020 Dec 10;63(23):14425-14447. doi: 10.1021/acs.jmedchem.0c01332. Epub 2020 Nov 3.

Uncertainty Quantification Using Neural Networks for Molecular Property Prediction.

J Chem Inf Model. 2020 Aug 24;60(8):3770-3780. doi: 10.1021/acs.jcim.0c00502. Epub 2020 Aug 4.

Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery.

ACS Cent Sci. 2020 Jun 24;6(6):939-949. doi: 10.1021/acscentsci.0c00229. Epub 2020 May 19.

Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization.

J Chem Inf Model. 2020 Sep 28;60(9):4311-4325. doi: 10.1021/acs.jcim.0c00120. Epub 2020 Jun 19.

Virtual Screening: Is Bigger Always Better? Or Can Small Be Beautiful?

J Chem Inf Model. 2020 Sep 28;60(9):4120-4123. doi: 10.1021/acs.jcim.0c00101. Epub 2020 May 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过基于分子库的主动学习加速高通量虚拟筛选

Accelerating high-throughput virtual screening through molecular pool-based active learning.

作者信息

Graff David E, Shakhnovich Eugene I, Coley Connor W

机构信息

Department of Chemistry and Chemical Biology, Harvard University Cambridge MA USA.

Department of Chemical Engineering, MIT Cambridge MA USA

出版信息

Chem Sci. 2021 Apr 29;12(22):7866-7881. doi: 10.1039/d0sc06805e.

DOI:10.1039/d0sc06805e

PMID:34168840

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8188596/

Abstract

摘要

通过基于分子库的主动学习加速高通量虚拟筛选

Accelerating high-throughput virtual screening through molecular pool-based active learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过基于分子库的主动学习加速高通量虚拟筛选

Accelerating high-throughput virtual screening through molecular pool-based active learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献