Suppr超能文献

基于回归的主动学习可实现超大库对接的无障碍加速。

Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking.

机构信息

Research Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia.

University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.

出版信息

J Chem Inf Model. 2024 Apr 8;64(7):2612-2623. doi: 10.1021/acs.jcim.3c01661. Epub 2023 Dec 29.

Abstract

Structure-based drug discovery is a process for both hit finding and optimization that relies on a validated three-dimensional model of a target biomolecule, used to rationalize the structure-function relationship for this particular target. An ultralarge virtual screening approach has emerged recently for rapid discovery of high-affinity hit compounds, but it requires substantial computational resources. This study shows that active learning with simple linear regression models can accelerate virtual screening, retrieving up to 90% of the top-1% of the docking hit list after docking just 10% of the ligands. The results demonstrate that it is unnecessary to use complex models, such as deep learning approaches, to predict the imprecise results of ligand docking with a low sampling depth. Furthermore, we explore active learning meta-parameters and find that constant batch size models with a simple ensembling method provide the best ligand retrieval rate. Finally, our approach is validated on the ultralarge size virtual screening data set, retrieving 70% of the top-0.05% of ligands after screening only 2% of the library. Altogether, this work provides a computationally accessible approach for accelerated virtual screening that can serve as a blueprint for the future design of low-compute agents for exploration of the chemical space via large-scale accelerated docking. With recent breakthroughs in protein structure prediction, this method can significantly increase accessibility for the academic community and aid in the rapid discovery of high-affinity hit compounds for various targets.

摘要

基于结构的药物发现是一种针对命中发现和优化的过程,它依赖于目标生物分子的经过验证的三维模型,用于合理化该特定目标的结构-功能关系。最近出现了一种超大规模的虚拟筛选方法,用于快速发现高亲和力的命中化合物,但它需要大量的计算资源。本研究表明,使用简单的线性回归模型进行主动学习可以加速虚拟筛选,在对接仅 10%的配体后,即可检索到对接命中列表中前 1%的高达 90%的化合物。结果表明,没有必要使用复杂的模型(如深度学习方法)来预测配体对接的不精确结果,因为对接采样深度较低。此外,我们还探索了主动学习元参数,并发现具有简单集成方法的固定批量大小模型提供了最佳的配体检索率。最后,我们的方法在超大规模虚拟筛选数据集上进行了验证,在筛选仅 2%的库后,即可检索到前 0.05%的配体的 70%。总之,这项工作为加速虚拟筛选提供了一种计算上可行的方法,可为未来通过大规模加速对接探索化学空间的低计算代理的设计提供蓝图。随着蛋白质结构预测的最新突破,该方法可以大大增加学术界的可及性,并有助于快速发现各种靶标高亲和力的命中化合物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e2a/11005039/996ed8581096/ci3c01661_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验