• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于创建高质量蛋白质-配体结合数据集以进行训练、验证和预测任务的工作流程。

A workflow to create a high-quality protein-ligand binding dataset for training, validation, and prediction tasks.

作者信息

Wang Yingze, Sun Kunyang, Li Jie, Guan Xingyi, Zhang Oufan, Bagni Dorian, Zhang Yang, Carlson Heather A, Head-Gordon Teresa

机构信息

Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California Berkeley CA 94720 USA.

Department of Computer Science, School of Computing, National University of Singapore 117417 Singapore.

出版信息

Digit Discov. 2025 Apr 2;4(5):1209-1220. doi: 10.1039/d4dd00357h. eCollection 2025 May 14.

DOI:10.1039/d4dd00357h
PMID:40190768
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11967698/
Abstract

Development of scoring functions (SFs) used to predict protein-ligand binding energies requires high-quality 3D structures and binding assay data for training and testing their parameters. In this work, we show that one of the widely-used datasets, PDBbind, suffers from several common structural artifacts of both proteins and ligands, which may compromise the accuracy, reliability, and generalizability of the resulting SFs. Therefore, we have developed a series of algorithms organized in a semi-automated workflow, HiQBind-WF, that curates non-covalent protein-ligand datasets to fix these problems. We also used this workflow to create an independent data set, HiQBind, by matching binding free energies from various sources including BioLiP, Binding MOAD and Binding DB with co-crystalized ligand-protein complexes from the PDB. The resulting HiQBind workflow and dataset are designed to ensure reproducibility and to minimize human intervention, while also being open-source to foster transparency in the improvements made to this important resource for the biology and drug discovery communities.

摘要

用于预测蛋白质-配体结合能的评分函数(SFs)的开发需要高质量的三维结构和结合测定数据来训练和测试其参数。在这项工作中,我们表明,广泛使用的数据集之一PDBbind存在蛋白质和配体的几个常见结构伪影,这可能会损害所得评分函数的准确性、可靠性和通用性。因此,我们开发了一系列以半自动工作流程HiQBind-WF组织的算法,该工作流程可整理非共价蛋白质-配体数据集以解决这些问题。我们还使用此工作流程通过匹配来自BioLiP、Binding MOAD和Binding DB等各种来源的结合自由能与来自PDB的共结晶配体-蛋白质复合物,创建了一个独立的数据集HiQBind。所得的HiQBind工作流程和数据集旨在确保可重复性并尽量减少人为干预,同时也是开源的,以促进对这一生物学和药物发现社区重要资源所做改进的透明度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/c44c13fa197a/d4dd00357h-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/54444a48e6d1/d4dd00357h-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/7d400d49992e/d4dd00357h-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/87d72fc332cc/d4dd00357h-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/f48cb155b3e9/d4dd00357h-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/5192f4e1c4fc/d4dd00357h-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/c44c13fa197a/d4dd00357h-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/54444a48e6d1/d4dd00357h-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/7d400d49992e/d4dd00357h-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/87d72fc332cc/d4dd00357h-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/f48cb155b3e9/d4dd00357h-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/5192f4e1c4fc/d4dd00357h-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ffe/11967698/c44c13fa197a/d4dd00357h-f6.jpg

相似文献

1
A workflow to create a high-quality protein-ligand binding dataset for training, validation, and prediction tasks.一种用于创建高质量蛋白质-配体结合数据集以进行训练、验证和预测任务的工作流程。
Digit Discov. 2025 Apr 2;4(5):1209-1220. doi: 10.1039/d4dd00357h. eCollection 2025 May 14.
2
A Workflow to Create a High-Quality Protein-Ligand Binding Dataset for Training, Validation, and Prediction Tasks.一种用于创建高质量蛋白质-配体结合数据集以进行训练、验证和预测任务的工作流程。
ArXiv. 2025 Mar 7:arXiv:2411.01223v2.
3
Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction.防泄漏PDBBind:用于更具通用性的结合亲和力预测的蛋白质-配体复合物重组数据集。
ArXiv. 2024 May 3:arXiv:2308.09639v2.
4
BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.BgN分数和BsN分数:基于装袋法和提升法的集成神经网络评分函数,用于准确预测蛋白质-配体复合物的结合亲和力。
BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S8. doi: 10.1186/1471-2105-16-S4-S8. Epub 2015 Feb 23.
5
Comparative evaluation of methods for the prediction of protein-ligand binding sites.蛋白质-配体结合位点预测方法的比较评估
J Cheminform. 2024 Nov 11;16(1):126. doi: 10.1186/s13321-024-00923-z.
6
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.为开发蛋白质-配体相互作用评分函数奠定基础。
Acc Chem Res. 2017 Feb 21;50(2):302-309. doi: 10.1021/acs.accounts.6b00491. Epub 2017 Feb 9.
7
BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions.BioLiP:一个半人工 curated 的数据库,用于生物学相关的配体-蛋白质相互作用。
Nucleic Acids Res. 2013 Jan;41(Database issue):D1096-103. doi: 10.1093/nar/gks966. Epub 2012 Oct 18.
8
A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction.传统评分函数与机器学习评分函数在蛋白质-配体结合亲和力预测中的预测准确性比较评估
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):335-47. doi: 10.1109/TCBB.2014.2351824.
9
A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction.常规与基于机器学习打分函数对蛋白质-配体结合亲和力预测的排序准确性比较评估。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1301-13. doi: 10.1109/TCBB.2012.36.
10
Structural artifacts in protein-ligand X-ray structures: implications for the development of docking scoring functions.蛋白质-配体X射线结构中的结构假象:对对接评分函数开发的影响。
J Med Chem. 2009 Sep 24;52(18):5673-84. doi: 10.1021/jm8016464.

引用本文的文献

1
Simpatico: accurate and ultra-fast virtual drug screening with atomic embeddings.辛帕提科:利用原子嵌入进行准确且超快速的虚拟药物筛选。
bioRxiv. 2025 Jun 8:2025.06.08.658499. doi: 10.1101/2025.06.08.658499.

本文引用的文献

1
BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data.2024年的BindingDB:蛋白质-小分子结合数据的可 FAIR 化知识库。
Nucleic Acids Res. 2025 Jan 6;53(D1):D1633-D1644. doi: 10.1093/nar/gkae1075.
2
Bridging Machine Learning and Thermodynamics for Accurate p Prediction.将机器学习与热力学相结合以实现准确的p预测
JACS Au. 2024 Jul 17;4(9):3451-3465. doi: 10.1021/jacsau.4c00271. eCollection 2024 Sep 23.
3
OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials.OpenMM 8:基于机器学习势的分子动力学模拟。
J Phys Chem B. 2024 Jan 11;128(1):109-116. doi: 10.1021/acs.jpcb.3c06662. Epub 2023 Dec 28.
4
The maximal and current accuracy of rigorous protein-ligand binding free energy calculations.严格的蛋白质-配体结合自由能计算的最大及当前精度。
Commun Chem. 2023 Oct 14;6(1):222. doi: 10.1038/s42004-023-01019-9.
5
BioLiP2: an updated structure database for biologically relevant ligand-protein interactions.BioLiP2:一个更新的生物相关配体-蛋白质相互作用结构数据库。
Nucleic Acids Res. 2024 Jan 5;52(D1):D404-D412. doi: 10.1093/nar/gkad630.
6
Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field.开发与基准测试 Open Force Field 2.0.0:Sage 小分子力场
J Chem Theory Comput. 2023 Jun 13;19(11):3251-3275. doi: 10.1021/acs.jctc.3c00039. Epub 2023 May 11.
7
Epik: p and Protonation State Prediction through Machine Learning.Epik:通过机器学习进行 p 和质子化状态预测。
J Chem Theory Comput. 2023 Apr 25;19(8):2380-2388. doi: 10.1021/acs.jctc.3c00044. Epub 2023 Apr 6.
8
Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools.停用 Binding MOAD,最后一次更新数据,并增加 3D 配体多药效学工具。
Sci Rep. 2023 Feb 21;13(1):3008. doi: 10.1038/s41598-023-29996-w.
9
Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN).基于几何交互图神经网络的蛋白质-配体结合亲和力 3D 结构预测(GIGN)。
J Phys Chem Lett. 2023 Mar 2;14(8):2020-2033. doi: 10.1021/acs.jpclett.2c03906. Epub 2023 Feb 16.
10
CovBinderInPDB: A Structure-Based Covalent Binder Database.CovBinderInPDB:一个基于结构的共价配体数据库。
J Chem Inf Model. 2022 Dec 12;62(23):6057-6068. doi: 10.1021/acs.jcim.2c01216. Epub 2022 Dec 1.