• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

影响用于蛋白质结构预测的诱饵集评分函数评估的假象和偏差。

Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction.

作者信息

Handl Julia, Knowles Joshua, Lovell Simon C

机构信息

Faculty of Life Sciences, University of Manchester, Manchester, UK.

出版信息

Bioinformatics. 2009 May 15;25(10):1271-9. doi: 10.1093/bioinformatics/btp150. Epub 2009 Mar 17.

DOI:10.1093/bioinformatics/btp150
PMID:19297350
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2677743/
Abstract

MOTIVATION

Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies.

RESULTS

We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods.

AVAILABILITY

Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.

摘要

动机

诱饵数据集由一个已解析的蛋白质结构和众多类似天然结构的替代结构组成,常用于蛋白质结构预测中评分函数的评估。文献中已指出使用这些数据集存在的几个陷阱,以及生成更有效诱饵数据集的有用指导原则。我们为这一正在进行的讨论贡献了对实验研究中常用的几个诱饵数据集的实证评估。

结果

我们发现这些数据中绝大多数存在的人为因素和采样问题使得区分天然结构变得轻而易举。这突出表明,基于天然结构的排名/z分数进行评估对评分函数性能的测试力度较弱。此外,在生成或使用诱饵集的方式中存在的采样偏差会强烈影响其他类型的评估指标,例如分数与到天然结构的均方根偏差(RMSD)之间的相关性。我们展示了根据偏差类型和评估背景,采样偏差如何可能导致对评分项、函数或方法质量的高估或低估。

可用性

本研究中使用的软件和数据的链接可在http://dbkgroup.org/handl/decoy_sets获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/9019c2e8de10/btp150f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/10a7fad2f2fc/btp150f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/88a978dc4ddb/btp150f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/954debde8576/btp150f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/9019c2e8de10/btp150f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/10a7fad2f2fc/btp150f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/88a978dc4ddb/btp150f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/954debde8576/btp150f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b33/2677743/9019c2e8de10/btp150f4.jpg

相似文献

1
Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction.影响用于蛋白质结构预测的诱饵集评分函数评估的假象和偏差。
Bioinformatics. 2009 May 15;25(10):1271-9. doi: 10.1093/bioinformatics/btp150. Epub 2009 Mar 17.
2
Improved protein structure selection using decoy-dependent discriminatory functions.使用诱饵依赖型判别函数改进蛋白质结构选择
BMC Struct Biol. 2004 Jun 18;4:8. doi: 10.1186/1472-6807-4-8.
3
Decoy Database Improvement for Protein Folding.用于蛋白质折叠的诱饵数据库改进
J Comput Biol. 2015 Sep;22(9):823-36. doi: 10.1089/cmb.2015.0116. Epub 2015 Aug 10.
4
How well can we predict native contacts in proteins based on decoy structures and their energies?基于诱饵结构及其能量,我们能多准确地预测蛋白质中的天然接触点?
Proteins. 2003 Sep 1;52(4):598-608. doi: 10.1002/prot.10444.
5
A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators.鸡绒毛蛋白头部结构域热稳定亚结构域的诱饵集,不同自由能估计器的比较。
BMC Bioinformatics. 2005 Dec 14;6:301. doi: 10.1186/1471-2105-6-301.
6
Soft energy function and generic evolutionary method for discriminating native from nonnative protein conformations.用于区分天然与非天然蛋白质构象的软能量函数和通用进化方法。
J Comput Chem. 2008 Jul 15;29(9):1364-73. doi: 10.1002/jcc.20897.
7
ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment.ANDIS:一种用于蛋白质结构质量评估的原子角度和距离相关统计势能。
BMC Bioinformatics. 2019 Jun 3;20(1):299. doi: 10.1186/s12859-019-2898-y.
8
Integrating Bonded and Nonbonded Potentials in the Knowledge-Based Scoring Function for Protein Structure Prediction.将键合和非键合势能集成到基于知识的蛋白质结构预测打分函数中。
J Chem Inf Model. 2019 Jun 24;59(6):3080-3090. doi: 10.1021/acs.jcim.9b00057. Epub 2019 May 13.
9
Discrimination of native loop conformations in membrane proteins: decoy library design and evaluation of effective energy scoring functions.膜蛋白中天然环构象的鉴别:诱饵文库设计及有效能量评分函数的评估
Proteins. 2003 Sep 1;52(4):492-509. doi: 10.1002/prot.10404.
10
Decoy selection for protein structure prediction via extreme gradient boosting and ranking.通过极端梯度提升和排序选择蛋白质结构预测的诱饵。
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):189. doi: 10.1186/s12859-020-3523-9.

引用本文的文献

1
Contact Potential for Structure Prediction of Proteins and Protein Complexes from Potts Model.基于 Potts 模型预测蛋白质和蛋白质复合物结构的接触势能。
Biophys J. 2018 Sep 4;115(5):809-821. doi: 10.1016/j.bpj.2018.07.035. Epub 2018 Aug 8.
2
3DRobot: automated generation of diverse and well-packed protein structure decoys.3D机器人:自动生成多样且排列良好的蛋白质结构诱饵
Bioinformatics. 2016 Feb 1;32(3):378-87. doi: 10.1093/bioinformatics/btv601. Epub 2015 Oct 14.
3
3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures.

本文引用的文献

1
Tinker 8: Software Tools for Molecular Design.Tinker 8:分子设计软件工具。
J Chem Theory Comput. 2018 Oct 9;14(10):5273-5289. doi: 10.1021/acs.jctc.8b00529. Epub 2018 Sep 19.
2
Model quality assessment using distance constraints from alignments.使用比对中的距离约束进行模型质量评估。
Proteins. 2009 May 15;75(3):540-9. doi: 10.1002/prot.22262.
3
On the structural convergence of biomolecular simulations by determination of the effective sample size.通过有效样本量的测定实现生物分子模拟的结构收敛
3dRNAscore:一种依赖于距离和扭转角的三维RNA结构评估函数。
Nucleic Acids Res. 2015 May 26;43(10):e63. doi: 10.1093/nar/gkv141. Epub 2015 Feb 24.
4
On the importance of the distance measures used to train and test knowledge-based potentials for proteins.论用于训练和测试基于知识的蛋白质势的距离度量的重要性。
PLoS One. 2014 Nov 20;9(11):e109335. doi: 10.1371/journal.pone.0109335. eCollection 2014.
5
Developing a high-quality scoring function for membrane protein structures based on specific inter-residue interactions.基于特定的残基间相互作用,开发用于膜蛋白结构的高质量打分函数。
J Comput Aided Mol Des. 2012 Mar;26(3):301-9. doi: 10.1007/s10822-012-9556-z. Epub 2012 Mar 1.
6
Statistical mechanics-based method to extract atomic distance-dependent potentials from protein structures.基于统计力学的方法从蛋白质结构中提取原子距离相关势能。
Proteins. 2011 Sep;79(9):2648-61. doi: 10.1002/prot.23086. Epub 2011 Jul 5.
7
Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation.用于 RNA 结构评估的完全可微的粗粒化和全原子基于知识的势能。
RNA. 2011 Jun;17(6):1066-75. doi: 10.1261/rna.2543711. Epub 2011 Apr 26.
8
Quality assessment of protein model-structures using evolutionary conservation.利用进化保守性评估蛋白质模型结构的质量。
Bioinformatics. 2010 May 15;26(10):1299-307. doi: 10.1093/bioinformatics/btq114. Epub 2010 Apr 12.
9
New statistical potential for quality assessment of protein models and a survey of energy functions.新的蛋白质模型质量评估统计势函数和能量函数综述。
BMC Bioinformatics. 2010 Mar 12;11:128. doi: 10.1186/1471-2105-11-128.
J Phys Chem B. 2007 Nov 8;111(44):12876-82. doi: 10.1021/jp073061t. Epub 2007 Oct 13.
4
Critical assessment of methods of protein structure prediction-Round VII.蛋白质结构预测方法的批判性评估——第七轮。
Proteins. 2007;69 Suppl 8(S8):3-9. doi: 10.1002/prot.21767.
5
Benchmarking consensus model quality assessment for protein fold recognition.蛋白质折叠识别的基准共识模型质量评估
BMC Bioinformatics. 2007 Sep 18;8:345. doi: 10.1186/1471-2105-8-345.
6
Can a physics-based, all-atom potential find a protein's native structure among misfolded structures? I. Large scale AMBER benchmarking.基于物理的全原子势能否在错误折叠的结构中找到蛋白质的天然结构?I. 大规模的AMBER基准测试。
J Comput Chem. 2007 Sep;28(12):2059-66. doi: 10.1002/jcc.20720.
7
Protein structure prediction by all-atom free-energy refinement.通过全原子自由能精修进行蛋白质结构预测。
BMC Struct Biol. 2007 Mar 19;7:12. doi: 10.1186/1472-6807-7-12.
8
Convergence of molecular dynamics simulations of membrane proteins.膜蛋白分子动力学模拟的收敛性。
Proteins. 2007 Apr 1;67(1):31-40. doi: 10.1002/prot.21308.
9
A composite score for predicting errors in protein structure models.用于预测蛋白质结构模型错误的综合评分。
Protein Sci. 2006 Jul;15(7):1653-66. doi: 10.1110/ps.062095806. Epub 2006 Jun 2.
10
A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators.鸡绒毛蛋白头部结构域热稳定亚结构域的诱饵集,不同自由能估计器的比较。
BMC Bioinformatics. 2005 Dec 14;6:301. doi: 10.1186/1471-2105-6-301.