使用随机森林改进AutoDock Vina：通过有效利用更大数据集提高结合亲和力预测的准确性。

Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

作者信息

Li Hongjian, Leung Kwong-Sak, Wong Man-Hon, Ballester Pedro J

机构信息

Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

出版信息

Mol Inform. 2015 Feb;34(2-3):115-26. doi: 10.1002/minf.201400132. Epub 2015 Feb 12.

DOI:10.1002/minf.201400132

PMID:27490034

Abstract

There is a growing body of evidence showing that machine learning regression results in more accurate structure-based prediction of protein-ligand binding affinity. Docking methods that aim at optimizing the affinity of ligands for a target rely on how accurate their predicted ranking is. However, despite their proven advantages, machine-learning scoring functions are still not widely applied. This seems to be due to insufficient understanding of their properties and the lack of user-friendly software implementing them. Here we present a study where the accuracy of AutoDock Vina, arguably the most commonly-used docking software, is strongly improved by following a machine learning approach. We also analyse the factors that are responsible for this improvement and their generality. Most importantly, with the help of a proposed benchmark, we demonstrate that this improvement will be larger as more data becomes available for training Random Forest models, as regression models implying additive functional forms do not improve with more training data. We discuss how the latter opens the door to new opportunities in scoring function development. In order to facilitate the translation of this advance to enhance structure-based molecular design, we provide software to directly re-score Vina-generated poses and thus strongly improve their predicted binding affinity. The software is available at http://istar.cse.cuhk.edu.hk/rf-score-3.tgz and http://crcm. marseille.inserm.fr/fileadmin/rf-score-3.tgz.

摘要

越来越多的证据表明，机器学习回归能够更准确地基于结构预测蛋白质-配体结合亲和力。旨在优化配体与靶标亲和力的对接方法依赖于其预测排名的准确性。然而，尽管机器学习评分函数已被证明具有优势，但仍未得到广泛应用。这似乎是由于对其特性了解不足以及缺乏实现它们的用户友好型软件。在此，我们展示了一项研究，通过采用机器学习方法，显著提高了可以说是最常用的对接软件AutoDock Vina的准确性。我们还分析了促成这种改进的因素及其普遍性。最重要的是，借助一个提议的基准，我们证明随着有更多数据可用于训练随机森林模型，这种改进会更大，因为暗示加性函数形式的回归模型不会随着更多训练数据而改进。我们讨论了后者如何为评分函数开发带来新机遇。为了促进将这一进展转化以增强基于结构的分子设计，我们提供了软件来直接对Vina生成的构象进行重新评分，从而显著提高其预测的结合亲和力。该软件可在http://istar.cse.cuhk.edu.hk/rf-score-3.tgz和http://crcm.marseille.inserm.fr/fileadmin/rf-score-3.tgz获取。

相似文献

Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

Mol Inform. 2015 Feb;34(2-3):115-26. doi: 10.1002/minf.201400132. Epub 2015 Feb 12.

Correcting the impact of docking pose generation error on binding affinity prediction.

BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):308. doi: 10.1186/s12859-016-1169-4.

DockingApp RF: A State-of-the-Art Novel Scoring Function for Molecular Docking in a User-Friendly Interface to AutoDock Vina.

Int J Mol Sci. 2020 Dec 15;21(24):9548. doi: 10.3390/ijms21249548.

Machine learning optimization of cross docking accuracy.

Comput Biol Chem. 2016 Jun;62:133-44. doi: 10.1016/j.compbiolchem.2016.04.005. Epub 2016 May 4.

istar: a web platform for large-scale protein-ligand docking.

PLoS One. 2014 Jan 24;9(1):e85678. doi: 10.1371/journal.pone.0085678. eCollection 2014.

Machine learning in computational docking.

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

Rescoring of docking poses under Occam's Razor: are there simpler solutions?

J Comput Aided Mol Des. 2018 Sep;32(9):877-888. doi: 10.1007/s10822-018-0155-5. Epub 2018 Sep 1.

Improving classical scoring functions using random forest: The non-additivity of free energy terms' contributions in binding.

Chem Biol Drug Des. 2018 Aug;92(2):1429-1434. doi: 10.1111/cbdd.13206. Epub 2018 Apr 27.

Development of a machine-learning model to predict Gibbs free energy of binding for protein-ligand complexes.

Biophys Chem. 2018 Sep;240:63-69. doi: 10.1016/j.bpc.2018.05.010. Epub 2018 Jun 7.

The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

Biomolecules. 2018 Mar 14;8(1):12. doi: 10.3390/biom8010012.

引用本文的文献

Comparative Analysis of Quantum-Mechanical and Standard Single-Structure Protein-Ligand Scoring Functions with MD-Based Free Energy Calculations.

J Chem Inf Model. 2025 Aug 11;65(15):8127-8136. doi: 10.1021/acs.jcim.5c00604. Epub 2025 Jul 19.

Predicting receptor-ligand pairing preferences in plant-microbe interfaces via molecular dynamics and machine learning.

Comput Struct Biotechnol J. 2025 Jun 18;27:2782-2795. doi: 10.1016/j.csbj.2025.06.029. eCollection 2025.

Studying Noncovalent Interactions in Molecular Systems with Machine Learning.

Chem Rev. 2025 Jun 25;125(12):5776-5829. doi: 10.1021/acs.chemrev.4c00893. Epub 2025 Jun 9.

StructureNet: Physics-Informed Hybridized Deep Learning Framework for Protein-Ligand Binding Affinity Prediction.

Bioengineering (Basel). 2025 May 10;12(5):505. doi: 10.3390/bioengineering12050505.

SPLIF-Enhanced Attention-Driven 3D CNNs for Precise and Reliable Protein-Ligand Interaction Modeling for METTL3.

ACS Omega. 2025 Apr 16;10(16):16748-16761. doi: 10.1021/acsomega.5c00538. eCollection 2025 Apr 29.

A workflow to create a high-quality protein-ligand binding dataset for training, validation, and prediction tasks.

Digit Discov. 2025 Apr 2;4(5):1209-1220. doi: 10.1039/d4dd00357h. eCollection 2025 May 14.

Persistent Directed Flag Laplacian (PDFL)-Based Machine Learning for Protein-Ligand Binding Affinity Prediction.

J Chem Theory Comput. 2025 Apr 22;21(8):4276-4285. doi: 10.1021/acs.jctc.5c00074. Epub 2025 Apr 5.

A Practical Guide to Computational Tools for Engineering Biocatalytic Properties.

Int J Mol Sci. 2025 Jan 24;26(3):980. doi: 10.3390/ijms26030980.

Robustly interrogating machine learning-based scoring functions: what are they learning?

Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf040.

Identification of Potential Selective PAK4 Inhibitors Through Shape and Protein Conformation Ensemble Screening and Electrostatic-Surface-Matching Optimization.

Curr Issues Mol Biol. 2025 Jan 6;47(1):29. doi: 10.3390/cimb47010029.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用随机森林改进AutoDock Vina：通过有效利用更大数据集提高结合亲和力预测的准确性。

Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

作者信息

Li Hongjian, Leung Kwong-Sak, Wong Man-Hon, Ballester Pedro J

机构信息

Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

出版信息

Mol Inform. 2015 Feb;34(2-3):115-26. doi: 10.1002/minf.201400132. Epub 2015 Feb 12.

DOI:10.1002/minf.201400132

PMID:27490034

Abstract

摘要

使用随机森林改进AutoDock Vina：通过有效利用更大数据集提高结合亲和力预测的准确性。

Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用随机森林改进AutoDock Vina：通过有效利用更大数据集提高结合亲和力预测的准确性。

Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献