• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PLAS-5k:用于机器学习应用的分子动力学中蛋白质-配体亲和力的数据集。

PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications.

机构信息

Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.

UM-DAE-Centre For Excellence In Basic Sciences, University of Mumbai, Vidyanagari, Mumbai, India.

出版信息

Sci Data. 2022 Sep 7;9(1):548. doi: 10.1038/s41597-022-01631-9.

DOI:10.1038/s41597-022-01631-9
PMID:36071074
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9451116/
Abstract

Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.

摘要

计算方法和最近的机器学习方法在基于结构的药物设计中发挥了关键作用。尽管有几个基准数据集可用于虚拟筛选中的机器学习应用,但准确预测蛋白质-配体复合物的结合亲和力仍然是一个主要挑战。新的数据集对于开发能够比最新评分函数更好地预测结合亲和力的模型非常重要。我们首次开发了一个数据集 PLAS-5k,其中包含了从 PDB 数据库中选择的 5000 个蛋白质-配体复合物。该数据集包含结合亲和力以及能量成分,如静电、范德华、极性和非极性溶剂化能,这些能量成分是使用 MMPBSA(分子力学泊松-玻尔兹曼表面积)方法从分子动力学模拟中计算得出的。计算出的结合亲和力优于对接评分,并与可用的实验值显示出良好的相关性。能量成分的可用性可能使在基于机器学习的药物设计中能够优化所需的成分。此外,OnionNet 模型已经在 PLAS-5k 数据集上进行了重新训练,并作为预测结合亲和力的基线提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/088389480ffc/41597_2022_1631_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/4c2f1ab4954c/41597_2022_1631_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/e4c8817399bb/41597_2022_1631_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/a88fae7a910b/41597_2022_1631_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/088389480ffc/41597_2022_1631_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/4c2f1ab4954c/41597_2022_1631_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/e4c8817399bb/41597_2022_1631_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/a88fae7a910b/41597_2022_1631_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a761/9452495/088389480ffc/41597_2022_1631_Fig4_HTML.jpg

相似文献

1
PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications.PLAS-5k:用于机器学习应用的分子动力学中蛋白质-配体亲和力的数据集。
Sci Data. 2022 Sep 7;9(1):548. doi: 10.1038/s41597-022-01631-9.
2
PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications.PLAS-20k:用于机器学习应用的 MD 模拟中蛋白质-配体亲和力的扩展数据集。
Sci Data. 2024 Feb 9;11(1):180. doi: 10.1038/s41597-023-02872-y.
3
Machine learning in computational docking.计算对接中的机器学习。
Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.
4
Computationally predicting binding affinity in protein-ligand complexes: free energy-based simulations and machine learning-based scoring functions.计算预测蛋白质-配体复合物中的结合亲和力:基于自由能的模拟和基于机器学习的评分函数。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa107.
5
Calculating protein-ligand binding affinities with MMPBSA: Method and error analysis.计算蛋白质配体结合亲和力的 MMPBSA 方法及误差分析。
J Comput Chem. 2016 Oct 15;37(27):2436-46. doi: 10.1002/jcc.24467. Epub 2016 Aug 11.
6
PharmRF: A machine-learning scoring function to identify the best protein-ligand complexes for structure-based pharmacophore screening with high enrichments.PharmRF:一种机器学习评分函数,用于识别具有高富集度的基于结构的药效团筛选的最佳蛋白质-配体复合物。
J Comput Chem. 2022 May 5;43(12):847-863. doi: 10.1002/jcc.26840. Epub 2022 Mar 18.
7
Assessing the performance of MM/PBSA and MM/GBSA methods. 10. Prediction reliability of binding affinities and binding poses for RNA-ligand complexes.评估 MM/PBSA 和 MM/GBSA 方法的性能。10. RNA-配体复合物结合亲和力和结合构象的预测可靠性。
Phys Chem Chem Phys. 2024 Mar 27;26(13):10323-10335. doi: 10.1039/d3cp04366e.
8
Comparison of end-point continuum-solvation methods for the calculation of protein-ligand binding free energies.比较用于计算蛋白质-配体结合自由能的终点连续溶剂化方法。
Proteins. 2012 May;80(5):1326-42. doi: 10.1002/prot.24029. Epub 2012 Feb 13.
9
binding affinity prediction for metabotropic glutamate receptors using both endpoint free energy methods and a machine learning-based scoring function.使用终点自由能方法和基于机器学习的评分函数预测代谢型谷氨酸受体的结合亲和力。
Phys Chem Chem Phys. 2022 Aug 3;24(30):18291-18305. doi: 10.1039/d2cp01727j.
10
A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction.传统评分函数与机器学习评分函数在蛋白质-配体结合亲和力预测中的预测准确性比较评估
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):335-47. doi: 10.1109/TCBB.2014.2351824.

引用本文的文献

1
Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction.基于分子动力学模拟的时空学习用于蛋白质-配体结合亲和力预测。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf429.
2
Studying Noncovalent Interactions in Molecular Systems with Machine Learning.利用机器学习研究分子系统中的非共价相互作用。
Chem Rev. 2025 Jun 25;125(12):5776-5829. doi: 10.1021/acs.chemrev.4c00893. Epub 2025 Jun 9.
3
Edge-enhanced interaction graph network for protein-ligand binding affinity prediction.

本文引用的文献

1
Anti-HIV drug repurposing against SARS-CoV-2.抗HIV药物用于治疗新型冠状病毒肺炎的研究
RSC Adv. 2020 Apr 21;10(27):15775-15783. doi: 10.1039/d0ra01899f.
2
Deep Learning in Virtual Screening: Recent Applications and Developments.深度学习在虚拟筛选中的应用及进展。
Int J Mol Sci. 2021 Apr 23;22(9):4435. doi: 10.3390/ijms22094435.
3
Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities.开发一种图卷积神经网络模型,以高效预测蛋白质-配体结合亲和力。
用于蛋白质-配体结合亲和力预测的边缘增强相互作用图网络。
PLoS One. 2025 Apr 8;20(4):e0320465. doi: 10.1371/journal.pone.0320465. eCollection 2025.
4
Equilibrium and Nonequilibrium Ensemble Methods for Accurate, Precise and Reproducible Absolute Binding Free Energy Calculations.用于准确、精确和可重复的绝对结合自由能计算的平衡与非平衡系综方法。
J Chem Theory Comput. 2025 Jan 14;21(1):440-462. doi: 10.1021/acs.jctc.4c01389. Epub 2024 Dec 16.
5
Natural Language Processing Methods for the Study of Protein-Ligand Interactions.用于研究蛋白质-配体相互作用的自然语言处理方法
ArXiv. 2024 Oct 17:arXiv:2409.13057v2.
6
Cordycepin Triphosphate as a Potential Modulator of Cellular Plasticity in Cancer via cAMP-Dependent Pathways: An In Silico Approach.虫草素三磷酸作为细胞可塑性在癌症中潜在调节剂:一种基于 cAMP 依赖途径的计算机模拟方法。
Int J Mol Sci. 2024 May 23;25(11):5692. doi: 10.3390/ijms25115692.
7
MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery.MISATO:基于结构的药物发现的蛋白质-配体复合物的机器学习数据集。
Nat Comput Sci. 2024 May;4(5):367-378. doi: 10.1038/s43588-024-00627-2. Epub 2024 May 10.
8
PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications.PLAS-20k:用于机器学习应用的 MD 模拟中蛋白质-配体亲和力的扩展数据集。
Sci Data. 2024 Feb 9;11(1):180. doi: 10.1038/s41597-023-02872-y.
PLoS One. 2021 Apr 8;16(4):e0249404. doi: 10.1371/journal.pone.0249404. eCollection 2021.
4
SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors.SMPLIP评分:通过简单且可解释的实时相互作用指纹模式描述符预测配体结合亲和力。
J Cheminform. 2021 Mar 25;13(1):28. doi: 10.1186/s13321-021-00507-1.
5
Drug Repurposing Approach against Novel Coronavirus Disease (COVID-19) through Virtual Screening Targeting SARS-CoV-2 Main Protease.通过针对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)主要蛋白酶的虚拟筛选寻找新型冠状病毒病(COVID-19)的药物再利用方法
Biology (Basel). 2020 Dec 23;10(1):2. doi: 10.3390/biology10010002.
6
Analysis of the efficacy of HIV protease inhibitors against SARS-CoV-2's main protease.分析 HIV 蛋白酶抑制剂对 SARS-CoV-2 主蛋白酶的疗效。
Virol J. 2020 Nov 26;17(1):190. doi: 10.1186/s12985-020-01457-0.
7
Structure-Based Virtual Screening: From Classical to Artificial Intelligence.基于结构的虚拟筛选:从经典方法到人工智能
Front Chem. 2020 Apr 28;8:343. doi: 10.3389/fchem.2020.00343. eCollection 2020.
8
Application of MM-PBSA Methods in Virtual Screening.MM-PBSA 方法在虚拟筛选中的应用。
Molecules. 2020 Apr 23;25(8):1971. doi: 10.3390/molecules25081971.
9
Coronavirus puts drug repurposing on the fast track.冠状病毒使药物重新利用走上快车道。
Nat Biotechnol. 2020 Apr;38(4):379-381. doi: 10.1038/d41587-020-00003-1.
10
MathDL: mathematical deep learning for D3R Grand Challenge 4.MathDL:用于 D3R 大挑战 4 的数学深度学习。
J Comput Aided Mol Des. 2020 Feb;34(2):131-147. doi: 10.1007/s10822-019-00237-5. Epub 2019 Nov 16.