• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction.防泄漏PDBBind:用于更具通用性的结合亲和力预测的蛋白质-配体复合物重组数据集。
ArXiv. 2024 May 3:arXiv:2308.09639v2.
2
BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.BgN分数和BsN分数:基于装袋法和提升法的集成神经网络评分函数,用于准确预测蛋白质-配体复合物的结合亲和力。
BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S8. doi: 10.1186/1471-2105-16-S4-S8. Epub 2015 Feb 23.
3
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.为开发蛋白质-配体相互作用评分函数奠定基础。
Acc Chem Res. 2017 Feb 21;50(2):302-309. doi: 10.1021/acs.accounts.6b00491. Epub 2017 Feb 9.
4
A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction.常规与基于机器学习打分函数对蛋白质-配体结合亲和力预测的排序准确性比较评估。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1301-13. doi: 10.1109/TCBB.2012.36.
5
A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction.传统评分函数与机器学习评分函数在蛋白质-配体结合亲和力预测中的预测准确性比较评估
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):335-47. doi: 10.1109/TCBB.2014.2351824.
6
PharmRF: A machine-learning scoring function to identify the best protein-ligand complexes for structure-based pharmacophore screening with high enrichments.PharmRF:一种机器学习评分函数,用于识别具有高富集度的基于结构的药效团筛选的最佳蛋白质-配体复合物。
J Comput Chem. 2022 May 5;43(12):847-863. doi: 10.1002/jcc.26840. Epub 2022 Mar 18.
7
A New, Improved Hybrid Scoring Function for Molecular Docking and Scoring Based on AutoDock and AutoDock Vina.一种基于AutoDock和AutoDock Vina的用于分子对接和评分的新型改进混合评分函数。
Chem Biol Drug Des. 2016 Apr;87(4):618-25. doi: 10.1111/cbdd.12697. Epub 2015 Dec 29.
8
Improving protein-ligand docking results using the Semiempirical quantum mechanics: testing on the PDBbind 2016 core set.使用半经验量子力学改进蛋白质-配体对接结果:在PDBbind 2016核心数据集上的测试
J Biomol Struct Dyn. 2025 Apr;43(7):3602-3612. doi: 10.1080/07391102.2023.2299742. Epub 2024 Jan 2.
9
The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction.交叉对接构象对用于蛋白质-配体结合构象预测的机器学习分类器性能的影响。
J Cheminform. 2021 Oct 16;13(1):81. doi: 10.1186/s13321-021-00560-w.
10
A New Scoring Function for Molecular Docking Based on AutoDock and AutoDock Vina.一种基于AutoDock和AutoDock Vina的分子对接新评分函数。
Curr Drug Discov Technol. 2015;12(3):170-8. doi: 10.2174/1570163812666150825110208.

防泄漏PDBBind:用于更具通用性的结合亲和力预测的蛋白质-配体复合物重组数据集。

Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction.

作者信息

Li Jie, Guan Xingyi, Zhang Oufan, Sun Kunyang, Wang Yingze, Bagni Dorian, Head-Gordon Teresa

出版信息

ArXiv. 2024 May 3:arXiv:2308.09639v2.

PMID:37645037
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10462179/
Abstract

Many physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since the general, refined, and core datasets of PDBBind are cross-contaminated with proteins and ligands with high similarity, and hence they may not perform comparably well in binding prediction of new protein-ligand complexes. In this work we have carefully prepared a cleaned PDBBind data set of non-covalent binders that are split into training, validation, and test datasets to control for data leakage, defined as proteins and ligands with high sequence and structural similarity. The resulting leak-proof (LP)-PDBBind data is used to retrain four popular SFs: AutoDock Vina, Random Forest (RF)-Score, InteractionGraphNet (IGN), and DeepDTA, to better test their capabilities when applied to new protein-ligand complexes. In particular we have formulated a new independent data set, BDB2020+, by matching high quality binding free energies from BindingDB with co-crystalized ligand-protein complexes from the PDB that have been deposited since 2020. Based on all the benchmark results, the retrained models using LP-PDBBind consistently perform better, with IGN especially being recommended for scoring and ranking applications for new protein-ligand systems.

摘要

许多用于预测蛋白质-配体结合自由能的基于物理和机器学习的评分函数(SFs)都是在PDBBind数据集上训练的。然而,新的评分函数是否真的有所改进存在争议,因为PDBBind的通用、精炼和核心数据集与具有高度相似性的蛋白质和配体存在交叉污染,因此它们在新的蛋白质-配体复合物的结合预测中可能表现不佳。在这项工作中,我们精心准备了一个经过清理的非共价结合剂PDBBind数据集,该数据集被分为训练、验证和测试数据集,以控制数据泄露,数据泄露定义为具有高度序列和结构相似性的蛋白质和配体。由此产生的防泄漏(LP)-PDBBind数据用于重新训练四种流行的评分函数:AutoDock Vina、随机森林(RF)-Score、InteractionGraphNet(IGN)和DeepDTA,以便在应用于新的蛋白质-配体复合物时更好地测试它们的能力。特别是,我们通过将BindingDB中的高质量结合自由能与自2020年以来沉积的PDB中共结晶的配体-蛋白质复合物相匹配,制定了一个新的独立数据集BDB2020+。基于所有基准测试结果,使用LP-PDBBind重新训练的模型始终表现更好,尤其推荐IGN用于新蛋白质-配体系统的评分和排名应用。