• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

留群外交叉验证适用于从不同蛋白质数据集得出的评分函数。

Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets.

机构信息

Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland.

出版信息

J Chem Inf Model. 2010 Nov 22;50(11):1961-9. doi: 10.1021/ci100264e. Epub 2010 Oct 12.

DOI:10.1021/ci100264e
PMID:20936880
Abstract

With the emergence of large collections of protein-ligand complexes complemented by binding data, as found in PDBbind or BindingMOAD, new opportunities for parametrizing and evaluating scoring functions have arisen. With huge data collections available, it becomes feasible to fit scoring functions in a QSAR style, i.e., by defining protein-ligand interaction descriptors and analyzing them with modern machine-learning methods. As in each data modeling ansatz, care has to be taken to validate the model carefully. Here, we show that there are large differences measured in R (0.77 vs 0.46) or R² (0.59 vs 0.21) for a relatively simple scoring function depending on whether it is validated against the PDBbind core set or validated in a leave-cluster-out cross-validation. If proteins from the same family are present in both the training and validation set, the estimated prediction quality from standard validation techniques looks too optimistic.

摘要

随着包含结合数据的大型蛋白质-配体复合物数据集的出现,例如 PDBbind 或 BindingMOAD 中所包含的数据集,参数化和评估评分函数的新机会也随之出现。有了大量可用的数据集合,就可以采用 QSAR 风格来拟合评分函数,即通过定义蛋白质-配体相互作用描述符,并使用现代机器学习方法对其进行分析。与每个数据建模方法一样,必须小心谨慎地验证模型。在这里,我们表明,对于一个相对简单的评分函数,根据其是针对 PDBbind 核心集进行验证还是在聚类外交叉验证中进行验证,其 R(0.77 与 0.46)或 R²(0.59 与 0.21)的测量值存在较大差异。如果同一族的蛋白质同时存在于训练集和验证集中,则来自标准验证技术的估计预测质量看起来过于乐观。

相似文献

1
Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets.留群外交叉验证适用于从不同蛋白质数据集得出的评分函数。
J Chem Inf Model. 2010 Nov 22;50(11):1961-9. doi: 10.1021/ci100264e. Epub 2010 Oct 12.
2
SFCscore(RF): a random forest-based scoring function for improved affinity prediction of protein-ligand complexes.SFCscore(RF):一种基于随机森林的打分函数,可提高蛋白-配体复合物亲和力预测的准确性。
J Chem Inf Model. 2013 Aug 26;53(8):1923-33. doi: 10.1021/ci400120b. Epub 2013 Jun 10.
3
Comparative assessment of scoring functions on a diverse test set.在多样化测试集上对评分函数的比较评估。
J Chem Inf Model. 2009 Apr;49(4):1079-93. doi: 10.1021/ci9000053.
4
SFCscore: scoring functions for affinity prediction of protein-ligand complexes.SFC评分:蛋白质-配体复合物亲和力预测的评分函数。
Proteins. 2008 Nov 1;73(2):395-419. doi: 10.1002/prot.22058.
5
Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices.利用人工神经网络和IGroup E态指数预测高效液相色谱保留指数
J Chem Inf Model. 2009 Apr;49(4):788-99. doi: 10.1021/ci9000162.
6
Predicting protein-ligand binding affinities using novel geometrical descriptors and machine-learning methods.使用新型几何描述符和机器学习方法预测蛋白质-配体结合亲和力。
J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):699-703. doi: 10.1021/ci034246+.
7
An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes.使用包含800个蛋白质-配体复合物的PDBbind精制集对14种评分函数进行的广泛测试。
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2114-25. doi: 10.1021/ci049733j.
8
Predicting antitrichomonal activity: a computational screening using atom-based bilinear indices and experimental proofs.预测抗滴虫活性:基于原子的双线性指数的计算筛选及实验验证
Bioorg Med Chem. 2006 Oct 1;14(19):6502-24. doi: 10.1016/j.bmc.2006.06.016. Epub 2006 Jul 27.
9
Using molecular docking, 3D-QSAR, and cluster analysis for screening structurally diverse data sets of pharmacological interest.利用分子对接、三维定量构效关系和聚类分析来筛选具有药理学意义的结构多样的数据集。
J Chem Inf Model. 2008 Oct;48(10):2054-65. doi: 10.1021/ci8001952. Epub 2008 Sep 24.
10
Global free energy scoring functions based on distance-dependent atom-type pair descriptors.基于距离相关原子类型对描述符的全局自由能评分函数。
J Chem Inf Model. 2011 Mar 28;51(3):707-20. doi: 10.1021/ci100473d. Epub 2011 Feb 22.

引用本文的文献

1
Relevance of 3D Rotationally Equivariant Neural Networks for Predicting Protein-Ligand Binding Affinities.3D旋转等变神经网络在预测蛋白质-配体结合亲和力方面的相关性
Interdiscip Sci. 2025 Aug 14. doi: 10.1007/s12539-025-00745-z.
2
Robustly interrogating machine learning-based scoring functions: what are they learning?深入探究基于机器学习的评分函数:它们在学习什么?
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf040.
3
Predicting Protein-Ligand Binding Affinity Using Fusion Model of Spatial-Temporal Graph Neural Network and 3D Structure-Based Complex Graph.
使用时空图神经网络与基于三维结构的复合物图融合模型预测蛋白质-配体结合亲和力
Interdiscip Sci. 2025 Jun;17(2):257-276. doi: 10.1007/s12539-024-00644-9. Epub 2024 Nov 14.
4
Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods.单细胞基因调控网络方法的基准测试方法
Bioinform Biol Insights. 2024 Nov 4;18:11779322241287120. doi: 10.1177/11779322241287120. eCollection 2024.
5
Multi-task bioassay pre-training for protein-ligand binding affinity prediction.多任务生物测定预训练用于蛋白质-配体结合亲和力预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad451.
6
PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular Docking.车前草:用于快速准确分子对接的扩散启发式姿态评分最小化
ArXiv. 2023 Jul 26:arXiv:2307.12090v2.
7
Novel Sulfonamide-Triazine Hybrid Derivatives: Docking, Synthesis, and Biological Evaluation as Anticancer Agents.新型磺酰胺-三嗪杂化衍生物:作为抗癌剂的对接、合成及生物学评价
ACS Omega. 2023 Apr 5;8(15):14247-14263. doi: 10.1021/acsomega.3c01273. eCollection 2023 Apr 18.
8
Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.基于结构的深度学习预测蛋白质-配体结合亲和力的评分函数综述
Front Bioinform. 2022 Jun 17;2. doi: 10.3389/fbinf.2022.885983.
9
Binding affinity prediction for protein-ligand complex using deep attention mechanism based on intermolecular interactions.基于分子间相互作用的深度学习注意力机制的蛋白-配体复合物结合亲和力预测。
BMC Bioinformatics. 2021 Nov 8;22(1):542. doi: 10.1186/s12859-021-04466-0.
10
A transferable active-learning strategy for reactive molecular force fields.一种用于反应性分子力场的可转移主动学习策略。
Chem Sci. 2021 Jul 5;12(32):10944-10955. doi: 10.1039/d1sc01825f. eCollection 2021 Aug 18.