• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质-蛋白质相互作用界面预测的基准测试:为何你应关注蛋白质大小。

Benchmarking protein-protein interface predictions: why you should care about protein size.

作者信息

Martin Juliette

机构信息

Bases Moléculaires et Structurales des Systèmes Infectieux, CNRS, UMR 5086; Université Lyon 1, IBCP, 7 passage du Vercors F-69367, France.

出版信息

Proteins. 2014 Jul;82(7):1444-52. doi: 10.1002/prot.24512. Epub 2014 Feb 12.

DOI:10.1002/prot.24512
PMID:24420747
Abstract

A number of predictive methods have been developed to predict protein-protein binding sites. Each new method is traditionally benchmarked using sets of protein structures of various sizes, and global statistics are used to assess the quality of the prediction. Little attention has been paid to the potential bias due to protein size on these statistics. Indeed, small proteins involve proportionally more residues at interfaces than large ones. If a predictive method is biased toward small proteins, this can lead to an over-estimation of its performance. Here, we investigate the bias due to the size effect when benchmarking protein-protein interface prediction on the widely used docking benchmark 4.0. First, we simulate random scores that favor small proteins over large ones. Instead of the 0.5 AUC (Area Under the Curve) value expected by chance, these biased scores result in an AUC equal to 0.6 using hypergeometric distributions, and up to 0.65 using constant scores. We then use real prediction results to illustrate how to detect the size bias by shuffling, and subsequently correct it using a simple conversion of the scores into normalized ranks. In addition, we investigate the scores produced by eight published methods and show that they are all affected by the size effect, which can change their relative ranking. The size effect also has an impact on linear combination scores by modifying the relative contributions of each method. In the future, systematic corrections should be applied when benchmarking predictive methods using data sets with mixed protein sizes.

摘要

已经开发了许多预测方法来预测蛋白质-蛋白质结合位点。传统上,每种新方法都使用各种大小的蛋白质结构集进行基准测试,并使用全局统计数据来评估预测质量。人们很少关注蛋白质大小对这些统计数据可能产生的偏差。实际上,与大蛋白质相比,小蛋白质在界面处包含的残基比例更大。如果一种预测方法偏向于小蛋白质,这可能会导致对其性能的高估。在这里,我们在广泛使用的对接基准4.0上对蛋白质-蛋白质界面预测进行基准测试时,研究了由于大小效应导致的偏差。首先,我们模拟出有利于小蛋白质而非大蛋白质的随机分数。这些有偏差的分数使用超几何分布得出的曲线下面积(AUC)值不是偶然预期的0.5,而是等于0.6,使用恒定分数时高达0.65。然后,我们使用实际预测结果来说明如何通过洗牌检测大小偏差,并随后通过将分数简单转换为归一化排名来对其进行校正。此外,我们研究了八种已发表方法产生的分数,结果表明它们都受到大小效应的影响,这可能会改变它们的相对排名。大小效应还会通过改变每种方法的相对贡献对线性组合分数产生影响。未来,在使用具有不同大小蛋白质的数据集对预测方法进行基准测试时,应进行系统校正。

相似文献

1
Benchmarking protein-protein interface predictions: why you should care about protein size.蛋白质-蛋白质相互作用界面预测的基准测试:为何你应关注蛋白质大小。
Proteins. 2014 Jul;82(7):1444-52. doi: 10.1002/prot.24512. Epub 2014 Feb 12.
2
Great interactions: How binding incorrect partners can teach us about protein recognition and function.精彩的相互作用:结合错误的伴侣如何让我们了解蛋白质识别与功能。
Proteins. 2016 Oct;84(10):1408-21. doi: 10.1002/prot.25086. Epub 2016 Jun 24.
3
Blind predictions of protein interfaces by docking calculations in CAPRI.通过 CAPRI 中的对接计算对蛋白质界面进行盲预测。
Proteins. 2010 Nov 15;78(15):3085-95. doi: 10.1002/prot.22850.
4
ProMate: a structure based prediction program to identify the location of protein-protein binding sites.ProMate:一个基于结构的预测程序,用于识别蛋白质-蛋白质结合位点的位置。
J Mol Biol. 2004 Apr 16;338(1):181-99. doi: 10.1016/j.jmb.2004.02.040.
5
Scoring optimisation of unbound protein-protein docking including protein binding site predictions.无蛋白-蛋白对接中包括蛋白结合位点预测的打分优化。
J Mol Recognit. 2012 Jan;25(1):15-23. doi: 10.1002/jmr.1165.
6
PSSM-based prediction of DNA binding sites in proteins.基于位置特异性得分矩阵的蛋白质中DNA结合位点预测
BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.
7
Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy.蛋白质界面中计算热点的识别:结合溶剂可及性和残基间势能可提高准确性。
Bioinformatics. 2009 Jun 15;25(12):1513-20. doi: 10.1093/bioinformatics/btp240. Epub 2009 Apr 8.
8
Sequence-based prediction of protein interaction sites with an integrative method.基于序列的蛋白质相互作用位点的综合预测方法。
Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.
9
Prediction of the interaction site on the surface of an isolated protein structure by analysis of side chain energy scores.通过分析侧链能量得分预测孤立蛋白质结构表面的相互作用位点。
Proteins. 2004 Nov 15;57(3):548-57. doi: 10.1002/prot.20238.
10
Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening.铅离子寻找器:一种提高蛋白质-配体对接、结合能估计和虚拟筛选准确性的方法。
J Chem Inf Model. 2008 Dec;48(12):2371-85. doi: 10.1021/ci800166p.

引用本文的文献

1
PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology.PIPENN-EMB集成网络和蛋白质嵌入技术将蛋白质界面预测推广到同源性之外。
Sci Rep. 2025 Feb 5;15(1):4391. doi: 10.1038/s41598-025-88445-y.
2
ProB-Site: Protein Binding Site Prediction Using Local Features.ProB-Site:使用局部特征预测蛋白质结合位点。
Cells. 2022 Jul 5;11(13):2117. doi: 10.3390/cells11132117.
3
Deep Learning for Protein-Protein Interaction Site Prediction.用于蛋白质-蛋白质相互作用位点预测的深度学习
Methods Mol Biol. 2021;2361:263-288. doi: 10.1007/978-1-0716-1641-3_16.
4
Identification and visualization of protein binding regions with the ArDock server.利用 ArDock 服务器鉴定和可视化蛋白质结合区域。
Nucleic Acids Res. 2018 Jul 2;46(W1):W417-W422. doi: 10.1093/nar/gky472.
5
Cryo-EM Data Are Superior to Contact and Interface Information in Integrative Modeling.在整合建模中,冷冻电镜数据优于接触和界面信息。
Biophys J. 2016 Feb 23;110(4):785-97. doi: 10.1016/j.bpj.2015.12.038. Epub 2016 Feb 1.
6
Predicting protein interface residues using easily accessible on-line resources.使用易于获取的在线资源预测蛋白质界面残基。
Brief Bioinform. 2015 Nov;16(6):1025-34. doi: 10.1093/bib/bbv009. Epub 2015 Mar 21.
7
Algorithmic approaches to protein-protein interaction site prediction.蛋白质-蛋白质相互作用位点预测的算法方法。
Algorithms Mol Biol. 2015 Feb 15;10:7. doi: 10.1186/s13015-015-0033-9. eCollection 2015.
8
Use B-factor related features for accurate classification between protein binding interfaces and crystal packing contacts.利用与B因子相关的特征,对蛋白质结合界面和晶体堆积接触进行准确分类。
BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S3. doi: 10.1186/1471-2105-15-S16-S3. Epub 2014 Dec 8.