数据质量和数量对蛋白质-配体结合亲和力预测深度学习的影响。

Effects of data quality and quantity on deep learning for protein-ligand binding affinity prediction.

机构信息

School of Health, Medical and Applied Sciences, Central Queensland University, Bundaberg, Queensland 4670, Australia.

Institute for Glycomics, Griffith University, Southport, Queensland 4222, Australia.

出版信息

Bioorg Med Chem. 2022 Oct 15;72:117003. doi: 10.1016/j.bmc.2022.117003. Epub 2022 Sep 9.

DOI:10.1016/j.bmc.2022.117003

PMID:36103795

Abstract

Prediction of protein-ligand binding affinities is crucial for computational drug discovery. A number of deep learning approaches have been developed in recent years to improve the accuracy of such affinity prediction. While the predicting power of these systems have advanced to some degrees depending on the dataset used for model training and testing, the effects of the quality and quantity of the underlying data have not been thoroughly examined. In this study, we employed erroneous datasets and data subsets of different sizes, created from one of the largest databases of experimental binding affinities, to train and evaluate a deep learning system based on convolutional neural networks. Our results show that data quality and quantity do have significant impacts on the prediction performance of trained models. Depending on the variations in data quality and quantity, the performance discrepancies could be comparable to or even larger than those observed among different deep learning approaches. In particular, the presence of proteins in the training data leads to a dramatic increase in prediction accuracy. This implies that continued accumulation of high-quality affinity data, especially for new protein targets, is indispensable for improving deep learning models to better predict protein-ligand binding affinities.

摘要

蛋白质 - 配体结合亲和力的预测对于计算药物发现至关重要。近年来，已经开发了许多深度学习方法来提高这种亲和力预测的准确性。虽然这些系统的预测能力已经根据用于模型训练和测试的数据集在一定程度上得到了提高，但基础数据的质量和数量的影响尚未得到彻底检查。在这项研究中，我们使用了来自最大的实验结合亲和力数据库之一的错误数据集和不同大小的数据子集来训练和评估基于卷积神经网络的深度学习系统。我们的结果表明，数据质量和数量确实对训练模型的预测性能有重大影响。根据数据质量和数量的变化，性能差异可能与不同深度学习方法之间观察到的差异相当，甚至更大。特别是，训练数据中存在蛋白质会导致预测准确性的显著提高。这意味着需要继续积累高质量的亲和力数据，特别是针对新的蛋白质靶标，这对于改进深度学习模型以更好地预测蛋白质 - 配体结合亲和力是必不可少的。

相似文献

Effects of data quality and quantity on deep learning for protein-ligand binding affinity prediction.

Bioorg Med Chem. 2022 Oct 15;72:117003. doi: 10.1016/j.bmc.2022.117003. Epub 2022 Sep 9.

AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks.

Int J Mol Sci. 2020 Nov 10;21(22):8424. doi: 10.3390/ijms21228424.

Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities.

PLoS One. 2021 Apr 8;16(4):e0249404. doi: 10.1371/journal.pone.0249404. eCollection 2021.

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data.

BMC Bioinformatics. 2017 Mar 23;18(Suppl 5):102. doi: 10.1186/s12859-017-1533-z.

Machine learning models for drug-target interactions: current knowledge and future directions.

Drug Discov Today. 2020 Apr;25(4):748-756. doi: 10.1016/j.drudis.2020.03.003. Epub 2020 Mar 12.

A New Hybrid Neural Network Deep Learning Method for Protein-Ligand Binding Affinity Prediction and De Novo Drug Design.

Int J Mol Sci. 2022 Nov 11;23(22):13912. doi: 10.3390/ijms232213912.

DeepDTA: deep drug-target binding affinity prediction.

Bioinformatics. 2018 Sep 1;34(17):i821-i829. doi: 10.1093/bioinformatics/bty593.

Predicting protein-ligand binding residues with deep convolutional neural networks.

BMC Bioinformatics. 2019 Feb 26;20(1):93. doi: 10.1186/s12859-019-2672-1.

Significance of Data Selection in Deep Learning for Reliable Binding Mode Prediction of Ligands in the Active Site of CYP3A4.

Chem Pharm Bull (Tokyo). 2019 Nov 1;67(11):1183-1190. doi: 10.1248/cpb.c19-00443. Epub 2019 Aug 17.

Binding affinity prediction for protein-ligand complex using deep attention mechanism based on intermolecular interactions.

BMC Bioinformatics. 2021 Nov 8;22(1):542. doi: 10.1186/s12859-021-04466-0.

引用本文的文献

In Silico ADME Methods Used in the Evaluation of Natural Products.

Pharmaceutics. 2025 Jul 31;17(8):1002. doi: 10.3390/pharmaceutics17081002.

Decoding the effects of mutation on protein interactions using machine learning.

Biophys Rev (Melville). 2025 Feb 21;6(1):011307. doi: 10.1063/5.0249920. eCollection 2025 Mar.

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

J Chem Inf Model. 2025 Mar 10;65(5):2191-2213. doi: 10.1021/acs.jcim.4c01907. Epub 2025 Feb 24.

Advances in Protein-Ligand Binding Affinity Prediction via Deep Learning: A Comprehensive Study of Datasets, Data Preprocessing Techniques, and Model Architectures.

Curr Drug Targets. 2024;25(15):1041-1065. doi: 10.2174/0113894501330963240905083020.

MDverse, shedding light on the dark matter of molecular dynamics simulations.

Elife. 2024 Aug 30;12:RP90061. doi: 10.7554/eLife.90061.

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review.

NPJ Digit Med. 2024 Aug 3;7(1):203. doi: 10.1038/s41746-024-01196-4.

Pathways to a Shiny Future: Building the Foundation for Computational Physical Chemistry and Biophysics in 2050.

ACS Phys Chem Au. 2024 Apr 4;4(4):302-313. doi: 10.1021/acsphyschemau.4c00003. eCollection 2024 Jul 24.

Machine learning insights into predicting biogas separation in metal-organic frameworks.

Commun Chem. 2024 May 8;7(1):102. doi: 10.1038/s42004-024-01166-7.

Deep learning in bioinformatics.

Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.

The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks.

Int J Mol Sci. 2023 Nov 9;24(22):16120. doi: 10.3390/ijms242216120.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

数据质量和数量对蛋白质-配体结合亲和力预测深度学习的影响。

Effects of data quality and quantity on deep learning for protein-ligand binding affinity prediction.

机构信息

School of Health, Medical and Applied Sciences, Central Queensland University, Bundaberg, Queensland 4670, Australia.

Institute for Glycomics, Griffith University, Southport, Queensland 4222, Australia.

出版信息

Bioorg Med Chem. 2022 Oct 15;72:117003. doi: 10.1016/j.bmc.2022.117003. Epub 2022 Sep 9.

DOI:10.1016/j.bmc.2022.117003

PMID:36103795

Abstract

摘要

数据质量和数量对蛋白质-配体结合亲和力预测深度学习的影响。

Effects of data quality and quantity on deep learning for protein-ligand binding affinity prediction.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

数据质量和数量对蛋白质-配体结合亲和力预测深度学习的影响。

Effects of data quality and quantity on deep learning for protein-ligand binding affinity prediction.

机构信息

出版信息

相似文献

引用本文的文献