一种基于机器学习的蛋白质 - 配体结合亲和力预测方法及其在分子对接中的应用。

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

机构信息

Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.

出版信息

Bioinformatics. 2010 May 1;26(9):1169-75. doi: 10.1093/bioinformatics/btq112. Epub 2010 Mar 17.

DOI:10.1093/bioinformatics/btq112

PMID:20236947

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3524828/

Abstract

MOTIVATION

Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.

RESULTS

We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.

CONTACT

pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

准确预测大量不同的蛋白质-配体复合物的结合亲和力是一项极具挑战性的任务。尝试进行这种计算预测的评分函数对于分析分子对接的输出至关重要，而分子对接反过来又是药物发现、化学生物学和结构生物学的重要技术。每个评分函数都假定了一个预先确定的理论启发式函数形式，用于描述复合物特征的变量之间的关系，这些变量还包括拟合实验或模拟数据及其预测的结合亲和力的参数。这种僵化方法的固有问题是，对于那些不符合建模假设的复合物，其预测能力较差。此外，重新采样策略（如交叉验证或自举）仍然没有系统地用于防止在评分函数的参数估计中对校准数据的过度拟合。

结果

我们提出了一种新的评分函数（RF-Score），通过非参数机器学习规避了对有问题的建模假设的需求。特别是，随机森林被用于隐式捕捉难以显式建模的结合效应。RF-Score 与要求苛刻的 PDBbind 基准进行了比较。结果表明，RF-Score 是一个非常有竞争力的评分函数。重要的是，随着训练集规模的增加，RF-Score 的性能得到了显著提高，因此预计未来将有更多高质量的结构和相互作用数据可用，这将导致 RF-Score 的改进版本。

联系方式

pedro.ballester@ebi.ac.uk；jbom@st-andrews.ac.uk

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

Bioinformatics. 2010 May 1;26(9):1169-75. doi: 10.1093/bioinformatics/btq112. Epub 2010 Mar 17.

Machine learning in computational docking.

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

J Chem Inf Model. 2014 Mar 24;54(3):944-55. doi: 10.1021/ci500091r. Epub 2014 Feb 20.

Boosted neural networks scoring functions for accurate ligand docking and ranking.

J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.

BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.

BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S8. doi: 10.1186/1471-2105-16-S4-S8. Epub 2015 Feb 23.

Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

BMC Bioinformatics. 2014 Aug 27;15(1):291. doi: 10.1186/1471-2105-15-291.

SFCscore(RF): a random forest-based scoring function for improved affinity prediction of protein-ligand complexes.

J Chem Inf Model. 2013 Aug 26;53(8):1923-33. doi: 10.1021/ci400120b. Epub 2013 Jun 10.

Learning from the ligand: using ligand-based features to improve binding affinity prediction.

Bioinformatics. 2020 Feb 1;36(3):758-764. doi: 10.1093/bioinformatics/btz665.

A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1301-13. doi: 10.1109/TCBB.2012.36.

K: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks.

J Chem Inf Model. 2018 Feb 26;58(2):287-296. doi: 10.1021/acs.jcim.7b00650. Epub 2018 Jan 29.

引用本文的文献

Benchmarking the Structure-Based Virtual Screening Performance of Wild-Type and Resistant DHFR Using Docking and Machine Learning Re-Scoring.

Drug Des Devel Ther. 2025 Aug 15;19:7045-7058. doi: 10.2147/DDDT.S537065. eCollection 2025.

Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction.

Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf429.

A deep-learning approach to predict reproductive toxicity of chemicals using communicative message passing neural network.

Front Toxicol. 2025 Jul 22;7:1640612. doi: 10.3389/ftox.2025.1640612. eCollection 2025.

Predicting receptor-ligand pairing preferences in plant-microbe interfaces via molecular dynamics and machine learning.

Comput Struct Biotechnol J. 2025 Jun 18;27:2782-2795. doi: 10.1016/j.csbj.2025.06.029. eCollection 2025.

SaeGraphDTI: drug-target interaction prediction based on sequence attribute extraction and graph neural network.

BMC Bioinformatics. 2025 Jul 15;26(1):177. doi: 10.1186/s12859-025-06195-0.

ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance.

Nat Commun. 2025 Jul 11;16(1):6436. doi: 10.1038/s41467-025-61745-7.

Comment on "Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction" by Sun and Gao.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf163.

CAML: Commutative Algebra Machine Learning─A Case Study on Protein-Ligand Binding Affinity Prediction.

J Chem Inf Model. 2025 Jul 14;65(13):6732-6743. doi: 10.1021/acs.jcim.5c00940. Epub 2025 Jun 15.

Graph convolutional neural networks improved target-specific scoring functions for cGAS and kRAS in virtual screening.

Comput Struct Biotechnol J. 2025 May 23;27:2176-2185. doi: 10.1016/j.csbj.2025.05.023. eCollection 2025.

Studying Noncovalent Interactions in Molecular Systems with Machine Learning.

Chem Rev. 2025 Jun 25;125(12):5776-5829. doi: 10.1021/acs.chemrev.4c00893. Epub 2025 Jun 9.

本文引用的文献

Targeted scoring functions for virtual screening.

Drug Discov Today. 2009 Jun;14(11-12):562-9. doi: 10.1016/j.drudis.2009.03.013. Epub 2009 Apr 5.

A chemogenomic approach to drug discovery: focus on cardiovascular diseases.

Drug Discov Today. 2009 May;14(9-10):479-85. doi: 10.1016/j.drudis.2009.02.010. Epub 2009 Mar 5.

Comparative assessment of scoring functions on a diverse test set.

J Chem Inf Model. 2009 Apr;49(4):1079-93. doi: 10.1021/ci9000053.

Chemical probes that competitively and selectively inhibit Stat3 activation.

PLoS One. 2009;4(3):e4783. doi: 10.1371/journal.pone.0004783. Epub 2009 Mar 10.

Computational evaluation of protein-small molecule binding.

Curr Opin Struct Biol. 2009 Feb;19(1):56-61. doi: 10.1016/j.sbi.2008.11.009. Epub 2009 Jan 21.

Community benchmarks for virtual screening.

J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):193-9. doi: 10.1007/s10822-008-9189-4. Epub 2008 Feb 14.

Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go.

Br J Pharmacol. 2008 Mar;153 Suppl 1(Suppl 1):S7-26. doi: 10.1038/sj.bjp.0707515. Epub 2007 Nov 26.

Molecular docking for substrate identification: the short-chain dehydrogenases/reductases.

J Mol Biol. 2008 Jan 18;375(3):855-74. doi: 10.1016/j.jmb.2007.10.065. Epub 2007 Nov 1.

A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming.

Proteins. 2007 Dec 1;69(4):823-31. doi: 10.1002/prot.21782.

y-Randomization and its variants in QSPR/QSAR.

J Chem Inf Model. 2007 Nov-Dec;47(6):2345-57. doi: 10.1021/ci700157b. Epub 2007 Sep 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于机器学习的蛋白质 - 配体结合亲和力预测方法及其在分子对接中的应用。

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

机构信息

Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.

出版信息

Bioinformatics. 2010 May 1;26(9):1169-75. doi: 10.1093/bioinformatics/btq112. Epub 2010 Mar 17.

DOI:10.1093/bioinformatics/btq112

PMID:20236947

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3524828/

Abstract

MOTIVATION

RESULTS

CONTACT

pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结果

联系方式

pedro.ballester@ebi.ac.uk；jbom@st-andrews.ac.uk

补充信息

补充数据可在 Bioinformatics 在线获得。

一种基于机器学习的蛋白质 - 配体结合亲和力预测方法及其在分子对接中的应用。

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

机构信息

出版信息

MOTIVATION

RESULTS

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种基于机器学习的蛋白质 - 配体结合亲和力预测方法及其在分子对接中的应用。

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

机构信息

出版信息

MOTIVATION

RESULTS

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

联系方式

补充信息