• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质数据库的一种与序列无关的三维表示。

A 3D sequence-independent representation of the protein data bank.

作者信息

Fischer D, Tsai C J, Nussinov R, Wolfson H

机构信息

Computer Science Department, School of Mathematical Sciences, Tel Aviv University, Israel.

出版信息

Protein Eng. 1995 Oct;8(10):981-97. doi: 10.1093/protein/8.10.981.

DOI:10.1093/protein/8.10.981
PMID:8771179
Abstract

Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the "structural universe' is populated.

摘要

在此,我们探讨以下问题。蛋白质数据库(PDB)中有多少种结构不同的条目?蛋白质是如何分布在结构空间中的?为了研究这些问题,我们从PDB中挑选了一组结构上非冗余的代表性条目。构建这样一个数据集并非易事:(i)PDB规模庞大,需要进行大量比较(1994年5月有超过3250个蛋白质链结构);(ii)PDB高度冗余,包含许多结构相似的条目,这些条目不一定具有显著的序列同源性;(iii)结构相似性没有明确的定义,这取决于所使用的标准和方法。在此,我们在忽略蛋白质拓扑结构的情况下分析结构相似性。迄今为止,代表性数据集要么是手动挑选的,要么是通过忽略蛋白质三维(3D)结构的序列比较技术挑选的,要么是通过先进行序列比较再进行线性结构比较(即在结构比较中强制考虑拓扑结构或链的顺序)挑选的。在此,我们描述一种与序列无关的3D自动化高效方法,从PDB中获取一组包含所有独特结构且结构上非冗余的蛋白质分子代表性集。该方法有两个新特点。第一个特点是在选择过程中使用严格的结构标准,而不考虑序列信息。为此,我们采用一种快速结构比较算法,在工作站上平均每对比较大约需要2秒。第二个新特点是迭代应用启发式聚类算法,这大大减少了所需的比较次数。我们获得了一组220条分辨率优于3.0 Å的链的代表性集,或者包括低分辨率条目、NMR条目和模型在内的268条链的代表性集。所得数据集可作为广泛的结构分类以及3D重复基序和序列 - 结构关系研究的基础。聚类算法成功地将没有显著序列同源性的链分类到同一结构家族中,例如所有球蛋白归为一组,所有胰蛋白酶样丝氨酸蛋白酶归为另一组,或者所有免疫球蛋白样折叠归为第三组。此外,还自动检测到链对之间有趣的意外结构相似性。对代表性结构的聚类分析展示了“结构空间”的填充方式。

相似文献

1
A 3D sequence-independent representation of the protein data bank.蛋白质数据库的一种与序列无关的三维表示。
Protein Eng. 1995 Oct;8(10):981-97. doi: 10.1093/protein/8.10.981.
2
PDB-REPRDB: a database of representative protein chains in PDB (Protein Data Bank).PDB-REPRDB:蛋白质数据库(PDB)中代表性蛋白质链的数据库。
Proc Int Conf Intell Syst Mol Biol. 1997;5:214-7.
3
Three-dimensional, sequence order-independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: potential implications to evolution and to protein folding.针对晶体学数据库对一种丝氨酸蛋白酶进行的三维、序列顺序无关的结构比较揭示了活性位点的相似性:对进化和蛋白质折叠的潜在影响。
Protein Sci. 1994 May;3(5):769-78. doi: 10.1002/pro.5560030506.
4
Analysis of topological and nontopological structural similarities in the PDB: new examples with old structures.蛋白质数据银行中拓扑和非拓扑结构相似性分析:旧结构的新例证
Proteins. 1996 Jul;25(3):354-65. doi: 10.1002/(SICI)1097-0134(199607)25:3<354::AID-PROT7>3.0.CO;2-F.
5
PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB).PDB-REPRDB:一个来自蛋白质数据库(PDB)的代表性蛋白质链数据库。
Nucleic Acids Res. 2001 Jan 1;29(1):219-20. doi: 10.1093/nar/29.1.219.
6
Quick selection of representative protein chain sets based on customizable requirements.基于可定制要求快速选择代表性蛋白质链集。
Bioinformatics. 2000 Jun;16(6):520-6. doi: 10.1093/bioinformatics/16.6.520.
7
A database of protein structure families with common folding motifs.一个具有共同折叠基序的蛋白质结构家族数据库。
Protein Sci. 1992 Dec;1(12):1691-8. doi: 10.1002/pro.5560011217.
8
Selection of a representative set of structures from Brookhaven Protein Data Bank.从布鲁克海文蛋白质数据库中选择一组具有代表性的结构。
Proteins. 1992 Oct;14(2):265-76. doi: 10.1002/prot.340140212.
9
Intrinsic disorder in the Protein Data Bank.蛋白质数据库中的内在无序状态。
J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123.
10
Automatic classification of protein structures relying on similarities between alignments.基于比对间相似性的蛋白质结构自动分类。
BMC Bioinformatics. 2012 Sep 14;13:233. doi: 10.1186/1471-2105-13-233.

引用本文的文献

1
Pioneer in Molecular Biology: Conformational Ensembles in Molecular Recognition, Allostery, and Cell Function.分子生物学先驱:分子识别、别构效应及细胞功能中的构象集合体
J Mol Biol. 2025 Jun 1;437(11):169044. doi: 10.1016/j.jmb.2025.169044. Epub 2025 Feb 25.
2
From the similarity analysis of protein cavities to the functional classification of protein families using cavbase.从蛋白质腔的相似性分析到利用cavbase对蛋白质家族进行功能分类。
J Mol Biol. 2006 Jun 16;359(4):1023-44. doi: 10.1016/j.jmb.2006.04.024. Epub 2006 Apr 25.
3
Detecting distant relatives of mammalian LPS-binding and lipid transport proteins.
检测哺乳动物脂多糖结合蛋白和脂质转运蛋白的远亲。
Protein Sci. 1998 Jul;7(7):1643-6. doi: 10.1002/pro.5560070721.
4
Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons.利用基于预测二级结构比较的程序ORF在詹氏甲烷球菌中寻找一种古老的酶。
Proc Natl Acad Sci U S A. 1998 Mar 17;95(6):2818-23. doi: 10.1073/pnas.95.6.2818.
5
Structural motifs at protein-protein interfaces: protein cores versus two-state and three-state model complexes.蛋白质-蛋白质界面处的结构基序:蛋白质核心与二态和三态模型复合物
Protein Sci. 1997 Sep;6(9):1793-805. doi: 10.1002/pro.5560060901.
6
Identification of cooperative folding units in a set of native proteins.一组天然蛋白质中协同折叠单元的鉴定。
Protein Sci. 1997 Aug;6(8):1627-42. doi: 10.1002/pro.5560060804.
7
The structural alignment between two proteins: is there a unique answer?两种蛋白质之间的结构比对:是否存在唯一答案?
Protein Sci. 1996 Jul;5(7):1325-38. doi: 10.1002/pro.5560050711.
8
Protein fold recognition using sequence-derived predictions.利用序列衍生预测进行蛋白质折叠识别。
Protein Sci. 1996 May;5(5):947-55. doi: 10.1002/pro.5560050516.