• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质数据银行(PDB)是一个小型蛋白质结构的覆盖集。

The PDB is a covering set of small protein structures.

作者信息

Kihara Daisuke, Skolnick Jeffrey

机构信息

Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington St, Suite 300, Buffalo, NY 14203, USA.

出版信息

J Mol Biol. 2003 Dec 5;334(4):793-802. doi: 10.1016/j.jmb.2003.10.027.

DOI:10.1016/j.jmb.2003.10.027
PMID:14636603
Abstract

Structure comparisons of all representative proteins have been done. Employing the relative root mean square deviation (RMSD) from native enables the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge: first, proteins with their native fold can be distinguished by their Z-score. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different secondary structure and fold class; i.e. 24.0% of them have 60% coverage by a template protein with a RMSD below 3.5A and 6.0% have 70% coverage. If the restriction that we align proteins only having different secondary structure types is removed, then in a representative benchmark set of proteins of 200 residues or smaller, 93% can be aligned to a single template structure (with average sequence identity of 9.8%), with a RMSD less than 4A, and 79% average coverage. In this sense, the current Protein Data Bank (PDB) is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the target protein, so that almost the entire molecule can be covered when combined. The number of proteins required to cover a target protein is very small, e.g. the top ten hit proteins can give 90% coverage below a RMSD of 3.5A for proteins up to 320 residues long. These results give a new view of the nature of protein structure space, and its implications for protein structure prediction are discussed.

摘要

已对所有代表性蛋白质进行了结构比较。利用相对于天然结构的相对均方根偏差(RMSD),可以根据Z分数评估不同长度结构比对的统计显著性。得出两个结论:第一,具有天然折叠的蛋白质可以通过其Z分数来区分。第二,有点令人惊讶的是,所有长度达100个残基的小蛋白质都与具有不同二级结构和折叠类别的其他蛋白质有显著的结构比对;也就是说,其中24.0%被RMSD低于3.5Å的模板蛋白质覆盖60%,6.0%被覆盖70%。如果去除我们只比对具有不同二级结构类型蛋白质的限制,那么在一个200个残基或更小的代表性蛋白质基准集中,93%可以与单个模板结构比对(平均序列同一性为9.8%),RMSD小于4Å,平均覆盖率为79%。从这个意义上说,当前的蛋白质数据库(PDB)几乎是一个小蛋白质结构的覆盖集。命中排名靠前的蛋白质之间比对区域的长度(相对于整个蛋白质长度)没有差异,这表明蛋白质结构空间高度密集。对于较大的蛋白质,不相关的蛋白质可以覆盖相当一部分结构。此外,这些命中排名靠前的蛋白质与目标蛋白质的不同部分比对,因此组合起来时几乎可以覆盖整个分子。覆盖一个目标蛋白质所需的蛋白质数量非常少,例如,对于长度达320个残基的蛋白质,排名前十的命中蛋白质在RMSD低于3.5Å时可以提供90%的覆盖率。这些结果给出了蛋白质结构空间性质的新观点,并讨论了其对蛋白质结构预测的影响。

相似文献

1
The PDB is a covering set of small protein structures.蛋白质数据银行(PDB)是一个小型蛋白质结构的覆盖集。
J Mol Biol. 2003 Dec 5;334(4):793-802. doi: 10.1016/j.jmb.2003.10.027.
2
The protein structure prediction problem could be solved using the current PDB library.蛋白质结构预测问题可以通过使用当前的蛋白质数据库(PDB)库来解决。
Proc Natl Acad Sci U S A. 2005 Jan 25;102(4):1029-34. doi: 10.1073/pnas.0407152101. Epub 2005 Jan 14.
3
Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm.PROSPECTOR_3线程算法的开发与大规模基准测试。
Proteins. 2004 Aug 15;56(3):502-18. doi: 10.1002/prot.20106.
4
Benchmarking of TASSER in the ab initio limit.从头算极限下TASSER的基准测试。
Proteins. 2007 Jul 1;68(1):48-56. doi: 10.1002/prot.21392.
5
TM-align: a protein structure alignment algorithm based on the TM-score.TM-align:一种基于TM分数的蛋白质结构比对算法。
Nucleic Acids Res. 2005 Apr 22;33(7):2302-9. doi: 10.1093/nar/gki524. Print 2005.
6
Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins.对中大型蛋白质综合基准进行三级结构预测。
Biophys J. 2004 Oct;87(4):2647-55. doi: 10.1529/biophysj.104.045385.
7
TOUCHSTONE II: a new approach to ab initio protein structure prediction.试金石二号:从头开始预测蛋白质结构的新方法。
Biophys J. 2003 Aug;85(2):1145-64. doi: 10.1016/S0006-3495(03)74551-2.
8
Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models?分子动力学模拟能否有助于区分正确与错误的蛋白质三维模型?
BMC Bioinformatics. 2008 Jan 7;9:6. doi: 10.1186/1471-2105-9-6.
9
Ab initio modeling of small proteins by iterative TASSER simulations.通过迭代TASSER模拟对小蛋白质进行从头建模。
BMC Biol. 2007 May 8;5:17. doi: 10.1186/1741-7007-5-17.
10
An alternative view of protein fold space.蛋白质折叠空间的另一种观点。
Proteins. 2000 Feb 15;38(3):247-60.

引用本文的文献

1
Predicting therapeutic and side effects from drug binding affinities to human proteome structures.从药物与人蛋白质组结构的结合亲和力预测治疗效果和副作用。
iScience. 2024 May 20;27(6):110032. doi: 10.1016/j.isci.2024.110032. eCollection 2024 Jun 21.
2
Fast protein structure comparison through effective representation learning with contrastive graph neural networks.通过对比图神经网络的有效表示学习进行快速蛋白质结构比较。
PLoS Comput Biol. 2022 Mar 24;18(3):e1009986. doi: 10.1371/journal.pcbi.1009986. eCollection 2022 Mar.
3
Completeness and Consistency in Structural Domain Classifications.
结构域分类中的完整性和一致性。
ACS Omega. 2021 Jun 8;6(24):15698-15707. doi: 10.1021/acsomega.1c00950. eCollection 2021 Jun 22.
4
Universal Architectural Concepts Underlying Protein Folding Patterns.蛋白质折叠模式背后的通用建筑概念。
Front Mol Biosci. 2021 Apr 30;7:612920. doi: 10.3389/fmolb.2020.612920. eCollection 2020.
5
Advances in integrative structural biology: Towards understanding protein complexes in their cellular context.整合结构生物学进展:在细胞环境中理解蛋白质复合物
Comput Struct Biotechnol J. 2020 Dec 3;19:214-225. doi: 10.1016/j.csbj.2020.11.052. eCollection 2021.
6
Probabilistic divergence of a template-based modelling methodology from the ideal protocol.基于模板的建模方法与理想方案的概率偏差。
J Mol Model. 2021 Jan 7;27(2):25. doi: 10.1007/s00894-020-04640-w.
7
Are RNA networks scale-free?RNA 网络是无标度的吗?
J Math Biol. 2020 Apr;80(5):1291-1321. doi: 10.1007/s00285-019-01463-z. Epub 2020 Jan 16.
8
MADOKA: an ultra-fast approach for large-scale protein structure similarity searching.MADOKA:一种用于大规模蛋白质结构相似性搜索的超快速方法。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):662. doi: 10.1186/s12859-019-3235-1.
9
Are protein-protein interfaces special regions on a protein's surface?蛋白质-蛋白质界面是蛋白质表面的特殊区域吗?
J Chem Phys. 2015 Dec 28;143(24):243149. doi: 10.1063/1.4937428.
10
From local structure to a global framework: recognition of protein folds.从局部结构到全局框架:蛋白质折叠的识别
J R Soc Interface. 2014 Apr 16;11(95):20131147. doi: 10.1098/rsif.2013.1147. Print 2014 Jun 6.