机器学习增强对接能够高效地对万亿级枚举化学库进行基于结构的虚拟筛选。

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries.

机构信息

School of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland.

CSC─IT Center for Science Ltd., Espoo FI-02101, Finland.

出版信息

J Chem Inf Model. 2023 Sep 25;63(18):5773-5783. doi: 10.1021/acs.jcim.3c01239. Epub 2023 Sep 1.

DOI:10.1021/acs.jcim.3c01239

PMID:37655823

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10523430/

Abstract

The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide high-throughput virtual screening protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time. In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.

摘要

超大筛选库的出现，其中充满了数十亿种现成的化合物，这给基于对接的虚拟筛选带来了越来越大的挑战。像 HASTEN 这样的机器学习 (ML) 增强策略结合了快速 ML 预测和小部分此类库的暴力对接，以提高筛选通量并应对千兆级库。在我们对一种抗菌伴侣蛋白和一种抗病毒激酶的案例研究中，我们首先使用快速 Glide 高通量虚拟筛选方案，为 Enamine REAL 类 Lead 库中的 15.6 亿种化合物生成了暴力对接基线。使用 HASTEN，当仅对接整个库的 1%时，我们观察到在两个靶标中，真实的 1000 个最佳虚拟命中中有 90%的稳健召回率。这将所需的对接实验减少了 99%，大大缩短了筛选时间。在激酶靶标中，氢键约束的使用导致大部分对接尝试失败，并阻碍了 ML 预测。我们展示了当进行 ML 增强筛选时，在处理失败化合物方面的优化潜力，并将 HASTEN 作为一种快速而强大的工具进行基准测试，并展示了 HASTEN 在日益增长的解锁千兆级筛选库覆盖的化学空间的方法武器库中的应用，用于日常药物发现活动。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac7/10523430/30d238f61b23/ci3c01239_0002.jpg

相似文献

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries.机器学习增强对接能够高效地对万亿级枚举化学库进行基于结构的虚拟筛选。

J Chem Inf Model. 2023 Sep 25;63(18):5773-5783. doi: 10.1021/acs.jcim.3c01239. Epub 2023 Sep 1.

HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries.基于对接增强生成建模的 HIt 发现（HIDDEN GEM）：一种用于加速超大规模化学库虚拟筛选的新型计算工作流程。

Mol Inform. 2024 Jan;43(1):e202300207. doi: 10.1002/minf.202300207. Epub 2023 Dec 19.

Assessing the Robustness and Scalability of Machine Learning Methods to Accelerate Ultralarge High-Throughput Docking Campaigns.评估机器学习方法的稳健性和可扩展性以加速超大高通量对接活动

ACS Omega. 2025 Apr 7;10(15):15598-15609. doi: 10.1021/acsomega.5c00829. eCollection 2025 Apr 22.

Machine Learning Boosted Docking (HASTEN): An Open-source Tool To Accelerate Structure-based Virtual Screening Campaigns.机器学习辅助对接（HASTEN）：一个加速基于结构的虚拟筛选项目的开源工具。

Mol Inform. 2021 Sep;40(9):e2100089. doi: 10.1002/minf.202100089. Epub 2021 Jun 1.

Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking.基于深度对接的人工智能辅助超大规模化学库虚拟筛选。

Nat Protoc. 2022 Mar;17(3):672-697. doi: 10.1038/s41596-021-00659-2. Epub 2022 Feb 4.

Perspectives on current approaches to virtual screening in drug discovery.对当前药物发现中虚拟筛选方法的看法。

Expert Opin Drug Discov. 2024 Oct;19(10):1173-1183. doi: 10.1080/17460441.2024.2390511. Epub 2024 Aug 12.

Rapid traversal of vast chemical space using machine learning-guided docking screens.利用机器学习引导的对接筛选快速遍历广阔的化学空间。

Nat Comput Sci. 2025 Apr;5(4):301-312. doi: 10.1038/s43588-025-00777-x. Epub 2025 Mar 13.

Evaluating Scalable Supervised Learning for Synthesize-on-Demand Chemical Libraries.评估按需合成化学库的可扩展监督学习。

J Chem Inf Model. 2023 Sep 11;63(17):5513-5528. doi: 10.1021/acs.jcim.3c00912. Epub 2023 Aug 25.

Efficient Exploration of Chemical Space with Docking and Deep Learning.运用对接和深度学习高效探索化学空间。

J Chem Theory Comput. 2021 Nov 9;17(11):7106-7119. doi: 10.1021/acs.jctc.1c00810. Epub 2021 Sep 30.

Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening.基于几何增强分子表示的深度学习用于大规模基于对接的虚拟筛选增强

J Chem Inf Model. 2023 Nov 13;63(21):6501-6514. doi: 10.1021/acs.jcim.3c01371. Epub 2023 Oct 26.

引用本文的文献

A bottom-up approach to find lead compounds in expansive chemical spaces.一种在广阔化学空间中寻找先导化合物的自下而上方法。

Commun Chem. 2025 Aug 1;8(1):225. doi: 10.1038/s42004-025-01610-2.

Identification of nanomolar adenosine A receptor ligands using reinforcement learning and structure-based drug design.利用强化学习和基于结构的药物设计鉴定纳摩尔级别的腺苷 A 受体配体。

Nat Commun. 2025 Jul 1;16(1):5485. doi: 10.1038/s41467-025-60629-0.

ACS Omega. 2025 Apr 7;10(15):15598-15609. doi: 10.1021/acsomega.5c00829. eCollection 2025 Apr 22.

A Database for Large-Scale Docking and Experimental Results.一个用于大规模对接和实验结果的数据库。

J Chem Inf Model. 2025 May 12;65(9):4458-4467. doi: 10.1021/acs.jcim.5c00394. Epub 2025 Apr 24.

A database for large-scale docking and experimental results.一个用于大规模对接和实验结果的数据库。

bioRxiv. 2025 Feb 27:2025.02.25.639879. doi: 10.1101/2025.02.25.639879.

Active learning driven prioritisation of compounds from on-demand libraries targeting the SARS-CoV-2 main protease.主动学习驱动从针对严重急性呼吸综合征冠状病毒2（SARS-CoV-2）主要蛋白酶的按需文库中对化合物进行优先级排序。

Digit Discov. 2025 Jan 8;4(2):438-450. doi: 10.1039/d4dd00343h. eCollection 2025 Feb 12.

ModBind, a Rapid Simulation-Based Predictor of Ligand Binding and Off-Rates.ModBind，一种基于快速模拟的配体结合及解离速率预测工具。

J Chem Inf Model. 2025 Jan 13;65(1):265-274. doi: 10.1021/acs.jcim.4c01805. Epub 2024 Dec 16.

Retrieval Augmented Docking Using Hierarchical Navigable Small Worlds.基于分层可导航小世界的检索增强对接。

J Chem Inf Model. 2024 Oct 14;64(19):7398-7408. doi: 10.1021/acs.jcim.4c00683. Epub 2024 Oct 3.

Protein Structure-Based Organic Chemistry-Driven Ligand Design from Ultralarge Chemical Spaces.基于蛋白质结构的有机化学驱动的超大型化学空间配体设计

ACS Cent Sci. 2024 Feb 13;10(3):615-627. doi: 10.1021/acscentsci.3c01521. eCollection 2024 Mar 27.

Thompson Sampling─An Efficient Method for Searching Ultralarge Synthesis on Demand Databases.Thompson 抽样─一种高效的按需搜索超大规模合成数据库的方法。

J Chem Inf Model. 2024 Feb 26;64(4):1158-1171. doi: 10.1021/acs.jcim.3c01790. Epub 2024 Feb 5.

本文引用的文献

SARS-CoV2 billion-compound docking.SARS-CoV2 十亿化合物对接。

Sci Data. 2023 Mar 28;10(1):173. doi: 10.1038/s41597-023-01984-9.

Chemical space docking enables large-scale structure-based virtual screening to discover ROCK1 kinase inhibitors.化学空间对接使基于结构的大规模虚拟筛选能够发现 ROCK1 激酶抑制剂。

Nat Commun. 2022 Oct 28;13(1):6447. doi: 10.1038/s41467-022-33981-8.

Bespoke library docking for 5-HT receptor agonists with antidepressant activity.具有抗抑郁活性的 5-HT 受体激动剂的定制文库对接。

Nature. 2022 Oct;610(7932):582-591. doi: 10.1038/s41586-022-05258-z. Epub 2022 Sep 28.

Exploration of Ultralarge Compound Collections for Drug Discovery.探索用于药物发现的超大型化合物库。

J Chem Inf Model. 2022 May 9;62(9):2021-2034. doi: 10.1021/acs.jcim.2c00224. Epub 2022 Apr 14.

Synthon-based ligand discovery in virtual libraries of over 11 billion compounds.基于合成子的配体发现虚拟库超过 110 亿化合物。

Nature. 2022 Jan;601(7893):452-459. doi: 10.1038/s41586-021-04220-9. Epub 2021 Dec 15.

Efficient Exploration of Chemical Space with Docking and Deep Learning.运用对接和深度学习高效探索化学空间。

J Chem Theory Comput. 2021 Nov 9;17(11):7106-7119. doi: 10.1021/acs.jctc.1c00810. Epub 2021 Sep 30.

Accelerating high-throughput virtual screening through molecular pool-based active learning.通过基于分子库的主动学习加速高通量虚拟筛选

Chem Sci. 2021 Apr 29;12(22):7866-7881. doi: 10.1039/d0sc06805e.

Mol Inform. 2021 Sep;40(9):e2100089. doi: 10.1002/minf.202100089. Epub 2021 Jun 1.

The chemfp project.化学指纹项目。

J Cheminform. 2019 Dec 5;11(1):76. doi: 10.1186/s13321-019-0398-8.

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19.基于超级计算机的集成对接药物发现管道及其在新冠病毒中的应用。

J Chem Inf Model. 2020 Dec 28;60(12):5832-5852. doi: 10.1021/acs.jcim.0c01010. Epub 2020 Dec 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习增强对接能够高效地对万亿级枚举化学库进行基于结构的虚拟筛选。

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献