多任务生物测定预训练用于蛋白质-配体结合亲和力预测。

Multi-task bioassay pre-training for protein-ligand binding affinity prediction.

机构信息

Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China.

Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China.

出版信息

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad451.

DOI:10.1093/bib/bbad451

PMID:38084920

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10783875/

Abstract

Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.

摘要

蛋白质-配体结合亲和力（PLBA）预测是药物发现的基本任务。最近，各种基于深度学习的模型通过将蛋白质-配体复合物的三维（3D）结构作为输入来预测结合亲和力，并取得了惊人的进展。然而，由于高质量训练数据的稀缺，当前模型的泛化能力仍然有限。尽管在 ChEMBL 等大型数据库中存在大量亲和力数据，但存在问题，如亲和力测量标签（即 IC50、Ki、Kd）不一致、不同的实验条件以及可用的 3D 结合结构缺乏，这使得使用这些数据开发高精度亲和力预测模型变得复杂。为了解决这些问题，我们（i）提出了多任务生物测定预训练（MBP），这是一种基于结构的 PLBA 预测的预训练框架；（ii）构建了一个名为 ChEMBL-Dock 的预训练数据集，其中包含超过 30 万个经过实验测量的亲和力标签和约 280 万个对接的 3D 结构。通过引入多任务预训练，将不同亲和力标签的预测视为不同任务，并对来自同一生物测定的样本进行相对排序分类，MBP 从我们具有不同且嘈杂标签的新 ChEMBL-Dock 数据集中学到了稳健且可转移的结构知识。实验证实了 MBP 在基于结构的 PLBA 预测任务上的能力。据我们所知，MBP 是第一个亲和力预训练模型，为未来的发展展示了巨大的潜力。MBP 网络服务器现在可在 https://huggingface.co/spaces/jiaxianustc/mbp 免费使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb32/10783875/45744cd29b53/bbad451f1.jpg

相似文献

Multi-task bioassay pre-training for protein-ligand binding affinity prediction.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad451.

GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.

BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.

PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction.

J Chem Inf Model. 2024 Apr 8;64(7):2205-2220. doi: 10.1021/acs.jcim.3c00253. Epub 2023 Jun 15.

Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment.

J Chem Inf Model. 2018 Jan 22;58(1):119-133. doi: 10.1021/acs.jcim.7b00309. Epub 2017 Dec 20.

ERL-ProLiGraph: Enhanced representation learning on protein-ligand graph structured data for binding affinity prediction.

Mol Inform. 2024 Dec;43(12):e202400044. doi: 10.1002/minf.202400044. Epub 2024 Oct 15.

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data.

BMC Bioinformatics. 2017 Mar 23;18(Suppl 5):102. doi: 10.1186/s12859-017-1533-z.

MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad310.

Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.

J Chem Inf Model. 2020 Sep 28;60(9):4200-4215. doi: 10.1021/acs.jcim.0c00411. Epub 2020 Sep 10.

Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph.

Phys Chem Chem Phys. 2023 Sep 13;25(35):24110-24120. doi: 10.1039/d3cp03651k.

AffinityVAE: A multi-objective model for protein-ligand affinity prediction and drug design.

Comput Biol Chem. 2023 Dec;107:107971. doi: 10.1016/j.compbiolchem.2023.107971. Epub 2023 Oct 11.

引用本文的文献

Assay2Mol: large language model-based drug design using BioAssay context.

ArXiv. 2025 Jul 16:arXiv:2507.12574v1.

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology.

PLoS Comput Biol. 2025 Jun 27;21(6):e1013216. doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun.

EM-PLA: environment-aware heterogeneous graph-based multimodal protein-ligand binding affinity prediction.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf298.

Learning Universal Representations of Intermolecular Interactions with ATOMICA.

bioRxiv. 2025 Jul 15:2025.04.02.646906. doi: 10.1101/2025.04.02.646906.

本文引用的文献

Relative molecule self-attention transformer.

J Cheminform. 2024 Jan 3;16(1):3. doi: 10.1186/s13321-023-00789-7.

Omics-based deep learning approaches for lung cancer decision-making and therapeutics development.

Brief Funct Genomics. 2024 May 15;23(3):181-192. doi: 10.1093/bfgp/elad031.

PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction.

J Chem Inf Model. 2024 Apr 8;64(7):2205-2220. doi: 10.1021/acs.jcim.3c00253. Epub 2023 Jun 15.

GraphscoreDTA: optimized graph neural network for protein-ligand binding affinity prediction.

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad340.

Improving drug-target affinity prediction via feature fusion and knowledge distillation.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad145.

Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions.

Molecules. 2023 Feb 9;28(4):1661. doi: 10.3390/molecules28041661.

Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN).

J Phys Chem Lett. 2023 Mar 2;14(8):2020-2033. doi: 10.1021/acs.jpclett.2c03906. Epub 2023 Feb 16.

Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding.

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac630.

A novel method for drug-target interaction prediction based on graph transformers model.

BMC Bioinformatics. 2022 Nov 3;23(1):459. doi: 10.1186/s12859-022-04812-w.

Graph-sequence attention and transformer for predicting drug-target affinity.

RSC Adv. 2022 Oct 14;12(45):29525-29534. doi: 10.1039/d2ra05566j. eCollection 2022 Oct 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多任务生物测定预训练用于蛋白质-配体结合亲和力预测。

Multi-task bioassay pre-training for protein-ligand binding affinity prediction.

机构信息

Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China.

Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China.

出版信息

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad451.

DOI:10.1093/bib/bbad451

PMID:38084920

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10783875/

Abstract

摘要

多任务生物测定预训练用于蛋白质-配体结合亲和力预测。

Multi-task bioassay pre-training for protein-ligand binding affinity prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

多任务生物测定预训练用于蛋白质-配体结合亲和力预测。

Multi-task bioassay pre-training for protein-ligand binding affinity prediction.

机构信息

出版信息