• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质谱:偏差与方案。

Protein profiles: Biases and protocols.

作者信息

Urban Gregor, Torrisi Mirko, Magnan Christophe N, Pollastri Gianluca, Baldi Pierre

机构信息

Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA.

UCD Institute for Discovery, University College Dublin, Dublin, 4, Ireland.

出版信息

Comput Struct Biotechnol J. 2020 Aug 27;18:2281-2289. doi: 10.1016/j.csbj.2020.08.015. eCollection 2020.

DOI:10.1016/j.csbj.2020.08.015
PMID:32994887
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7486441/
Abstract

The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu) and can be downloaded at: www.download.igb.uci.edu/#evalpro.

摘要

自20世纪90年代以来,使用进化谱来预测蛋白质二级结构以及其他蛋白质结构特征一直是标准做法。在这类预测器的输入中使用谱,无论是替代序列本身还是与序列本身一起使用,都能显著提高预测的准确性。虽然谱可以增强结构信号,但它们的作用仍然有些令人惊讶,因为蛋白质在体内折叠时并不使用谱。此外,最初为训练和评估基于序列的预测器而推导的相同的基于序列的冗余减少协议,已被应用于训练和评估基于谱的预测器。这可能导致不公平的比较,因为谱可能会促进训练集和测试集之间的信息泄露。在这里,我们利用广泛研究的二级结构预测问题来更好地评估谱的作用,并表明:(1) 使用基于序列的标准冗余协议时,训练蛋白和测试蛋白之间存在高水平的谱相似性;(2) 基于谱的预测器相对于基于序列的预测器在准确性上的提高,强烈依赖于训练蛋白和测试蛋白之间的这些高水平谱相似性;(3) 基于谱的预测器在给定蛋白质数据集上的总体准确性,在试图估计预测器的实际准确性或与其他预测器进行比较时提供了一种度量。然而,我们表明,通过实施一种新的协议(EVALpro)可以减轻这种偏差,该协议根据训练蛋白和测试蛋白之间的谱相似性来评估基于谱的预测器的准确性。这样的协议不仅允许在同样困难或容易的示例上对预测器进行公平比较,而且还减少了在选择测试蛋白时选择给定相似性阈值的影响。EVALpro程序可在SCRATCH套件(www.scratch.proteomics.ics.uci.edu)中获得,可从以下网址下载:www.download.igb.uci.edu/#evalpro 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/f2536bc363a0/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/eb366a1ab5ad/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/c66053e5eccb/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/ba2bc542d9fa/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/d20f4ae8f338/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/421c7437bf44/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/7fd073458214/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/f2536bc363a0/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/eb366a1ab5ad/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/c66053e5eccb/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/ba2bc542d9fa/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/d20f4ae8f338/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/421c7437bf44/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/7fd073458214/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115c/7486441/f2536bc363a0/gr6.jpg

相似文献

1
Protein profiles: Biases and protocols.蛋白质谱:偏差与方案。
Comput Struct Biotechnol J. 2020 Aug 27;18:2281-2289. doi: 10.1016/j.csbj.2020.08.015. eCollection 2020.
2
SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.SSpro/ACCpro 5:利用序列谱、机器学习和结构相似性对蛋白质二级结构和相对溶剂可及性进行近乎完美的预测。
Bioinformatics. 2014 Sep 15;30(18):2592-7. doi: 10.1093/bioinformatics/btu352. Epub 2014 May 24.
3
SSpro/ACCpro 6: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, deep learning and structural similarity.SSpro/ACCpro 6:使用轮廓、深度学习和结构相似性进行蛋白质二级结构和相对溶剂可及性的近乎完美预测。
Bioinformatics. 2022 Mar 28;38(7):2064-2065. doi: 10.1093/bioinformatics/btac019.
4
Effective connectivity profile: a structural representation that evidences the relationship between protein structures and sequences.有效连接性概况:一种证明蛋白质结构与序列之间关系的结构表示。
Proteins. 2008 Dec;73(4):872-88. doi: 10.1002/prot.22113.
5
SCRATCH: a protein structure and structural feature prediction server.SCRATCH:一个蛋白质结构和结构特征预测服务器。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W72-6. doi: 10.1093/nar/gki396.
6
PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure.佩皮托:使用多个距离阈值和半球暴露改进的不连续B细胞表位预测
Bioinformatics. 2008 Jun 15;24(12):1459-60. doi: 10.1093/bioinformatics/btn199. Epub 2008 Apr 28.
7
Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles.使用递归神经网络和轮廓改进三类和八类蛋白质二级结构的预测。
Proteins. 2002 May 1;47(2):228-35. doi: 10.1002/prot.10082.
8
Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching.使用核方法、二维递归神经网络和加权图匹配对二硫键进行大规模预测。
Proteins. 2006 Mar 15;62(3):617-29. doi: 10.1002/prot.20787.
9
Advancing the Accuracy of Protein Fold Recognition by Utilizing Profiles From Hidden Markov Models.利用隐马尔可夫模型的轮廓提高蛋白质折叠识别的准确性
IEEE Trans Nanobioscience. 2015 Oct;14(7):761-72. doi: 10.1109/TNB.2015.2457906. Epub 2015 Jul 20.
10
Combining prediction of secondary structure and solvent accessibility in proteins.蛋白质二级结构预测与溶剂可及性预测相结合。
Proteins. 2005 May 15;59(3):467-75. doi: 10.1002/prot.20441.

引用本文的文献

1
TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor.TopEC:利用三维图神经网络和局部三维蛋白质描述符预测酶委员会类别
Nat Commun. 2025 Mar 20;16(1):2737. doi: 10.1038/s41467-025-57324-5.
2
Predictive analyses of regulatory sequences with EUGENe.使用 EUGENe 进行调控序列的预测分析。
Nat Comput Sci. 2023 Nov;3(11):946-956. doi: 10.1038/s43588-023-00544-w. Epub 2023 Nov 16.
3
Deep learning for protein secondary structure prediction: Pre and post-AlphaFold.用于蛋白质二级结构预测的深度学习:AlphaFold之前与之后。

本文引用的文献

1
Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction.用于蛋白质二级结构预测的深度剖面和级联递归与卷积神经网络。
Sci Rep. 2019 Aug 26;9(1):12374. doi: 10.1038/s41598-019-48786-x.
2
The PSIPRED Protein Analysis Workbench: 20 years on.PSIPRED 蛋白质分析工作平台:20 年的发展
Nucleic Acids Res. 2019 Jul 2;47(W1):W402-W407. doi: 10.1093/nar/gkz297.
3
Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks.
Comput Struct Biotechnol J. 2022 Nov 11;20:6271-6286. doi: 10.1016/j.csbj.2022.11.012. eCollection 2022.
4
Navigating the pitfalls of applying machine learning in genomics.在基因组学中应用机器学习的陷阱。
Nat Rev Genet. 2022 Mar;23(3):169-181. doi: 10.1038/s41576-021-00434-9. Epub 2021 Nov 26.
利用预测的接触图和递归与残差卷积神经网络的集合来改进蛋白质二级结构、主链角度、溶剂可及性和接触数的预测。
Bioinformatics. 2019 Jul 15;35(14):2403-2410. doi: 10.1093/bioinformatics/bty1006.
4
Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning.基于单序列的深度学习全序列预测蛋白质二级结构和溶剂可及性。
J Comput Chem. 2018 Oct 5;39(26):2210-2216. doi: 10.1002/jcc.25534. Epub 2018 Oct 14.
5
Protein secondary structure prediction: A survey of the state of the art.蛋白质二级结构预测:最新技术综述。
J Mol Graph Model. 2017 Sep;76:379-402. doi: 10.1016/j.jmgm.2017.07.015. Epub 2017 Jul 19.
6
Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility.利用长短期记忆双向递归神经网络捕捉非局部相互作用,提高蛋白质二级结构、主链角度、接触数和溶剂可及性的预测能力。
Bioinformatics. 2017 Sep 15;33(18):2842-2849. doi: 10.1093/bioinformatics/btx218.
7
Sixty-five years of the long march in protein secondary structure prediction: the final stretch?蛋白质二级结构预测的长征:终章?
Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129.
8
CATH: an expanded resource to predict protein function through structure and sequence.CATH:一个通过结构和序列预测蛋白质功能的扩展资源。
Nucleic Acids Res. 2017 Jan 4;45(D1):D289-D295. doi: 10.1093/nar/gkw1098. Epub 2016 Nov 28.
9
Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning.通过迭代深度学习改进蛋白质二级结构、局部主链角度和溶剂可及表面积的预测。
Sci Rep. 2015 Jun 22;5:11476. doi: 10.1038/srep11476.
10
JPred4: a protein secondary structure prediction server.JPred4:一种蛋白质二级结构预测服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W389-94. doi: 10.1093/nar/gkv332. Epub 2015 Apr 16.