• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

同源性衍生蛋白质结构数据库及序列比对的结构意义

Database of homology-derived protein structures and the structural meaning of sequence alignment.

作者信息

Sander C, Schneider R

机构信息

European Molecular Biology Laboratory, Heidelberg, Federal Republic of Germany.

出版信息

Proteins. 1991;9(1):56-68. doi: 10.1002/prot.340090107.

DOI:10.1002/prot.340090107
PMID:2017436
Abstract

The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.

摘要

基于以下观察结果,利用序列同源性可显著增加已知蛋白质三维结构的数据库。(1)已知序列的数据库目前包含超过12000种蛋白质,比已知结构的数据库大两个数量级。(2)目前预测蛋白质结构最强大的方法是同源性建模。(3)结构同源性可从序列相似性水平推断出来。(4)足以实现结构同源性的序列相似性阈值在很大程度上取决于比对的长度。在此,我们首先通过对已知结构蛋白质之间的比对进行详尽调查,量化序列相似性、结构相似性和比对长度之间的关系,并报告作为比对长度函数的同源性阈值曲线。然后,我们通过根据阈值曲线将所有被认为同源的序列与每个已知结构的蛋白质进行比对,生成一个蛋白质同源性衍生二级结构(HSSP)数据库。对于每个已知的蛋白质结构,衍生数据库包含比对序列、二级结构、序列变异性和序列概况。比对序列的三级结构是隐含的,但未明确建模。该数据库有效地将已知蛋白质结构的数量增加了五倍,超过1800种。这些结果可能有助于评估序列数据库搜索中匹配的结构意义,推导结构预测的偏好和模式,阐明保守残基的结构作用,以及通过同源性对三维细节进行建模。

相似文献

1
Database of homology-derived protein structures and the structural meaning of sequence alignment.同源性衍生蛋白质结构数据库及序列比对的结构意义
Proteins. 1991;9(1):56-68. doi: 10.1002/prot.340090107.
2
The HSSP database of protein structure-sequence alignments.蛋白质结构-序列比对的HSSP数据库。
Nucleic Acids Res. 1994 Sep;22(17):3597-9.
3
Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures.通过评估序列-结构适应性预测蛋白质结构。将序列与从三维结构推导的接触谱进行比对。
J Mol Biol. 1993 Aug 5;232(3):805-25. doi: 10.1006/jmbi.1993.1433.
4
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.
5
Alignment and searching for common protein folds using a data bank of structural templates.利用结构模板数据库进行比对并寻找常见蛋白质折叠。
J Mol Biol. 1993 Jun 5;231(3):735-52. doi: 10.1006/jmbi.1993.1323.
6
Modeling three-dimensional protein structures for amino acid sequences of the CASP3 experiment using sequence-derived predictions.利用序列衍生预测为CASP3实验的氨基酸序列构建三维蛋白质结构模型。
Proteins. 1999;Suppl 3:61-5.
7
Homology-based modeling of 3D structures of protein-protein complexes using alignments of modified sequence profiles.利用修饰序列谱比对进行蛋白质-蛋白质复合物三维结构的基于同源性的建模。
Int J Biol Macromol. 2008 Aug 15;43(2):198-208. doi: 10.1016/j.ijbiomac.2008.05.004. Epub 2008 May 21.
8
The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures.ConSurf-HSSP数据库:同源物间进化保守性在蛋白质数据银行(PDB)结构上的映射。
Proteins. 2005 Feb 15;58(3):610-7. doi: 10.1002/prot.20305.
9
A database of protein structure families with common folding motifs.一个具有共同折叠基序的蛋白质结构家族数据库。
Protein Sci. 1992 Dec;1(12):1691-8. doi: 10.1002/pro.5560011217.
10
NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities.NdPASA:一种整合了邻域依赖氨基酸倾向的新型双序列蛋白质序列比对算法。
Proteins. 2005 Feb 15;58(3):628-37. doi: 10.1002/prot.20359.

引用本文的文献

1
Effect of Mutations on the Evolution of Extended Spectrum β-lactamases (ESBL).突变对超广谱β-内酰胺酶(ESBL)进化的影响。
Protein J. 2025 Aug 19. doi: 10.1007/s10930-025-10284-7.
2
Newly Developed Structure-Based Methods Do Not Outperform Standard Sequence-Based Methods for Large-Scale Phylogenomics.新开发的基于结构的方法在大规模系统发育基因组学方面并不优于标准的基于序列的方法。
Mol Biol Evol. 2025 Jul 1;42(7). doi: 10.1093/molbev/msaf149.
3
A universal DNA microarray for rapid fish species authentication.一种用于快速鱼类物种鉴定的通用DNA微阵列。
Food Chem (Oxf). 2025 Jan 19;10:100241. doi: 10.1016/j.fochms.2025.100241. eCollection 2025 Jun.
4
Discovery and Analysis of Repeat and Low-Complexity Architectures in Proteins and Their Conserved Evolutionary Relationships Using Self-Homology Dot Plots.使用自同源点图发现和分析蛋白质中的重复和低复杂度结构及其保守的进化关系。
Methods Mol Biol. 2025;2870:95-116. doi: 10.1007/978-1-0716-4213-9_7.
5
Assessing the role of evolutionary information for enhancing protein language model embeddings.评估进化信息在增强蛋白质语言模型嵌入中的作用。
Sci Rep. 2024 Sep 5;14(1):20692. doi: 10.1038/s41598-024-71783-8.
6
Systematic discovery of DNA-binding tandem repeat proteins.DNA 结合串联重复蛋白的系统发现。
Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710.
7
SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects.SpanSeq:基于相似度的序列数据分割方法,用于改进深度学习项目的开发与评估。
NAR Genom Bioinform. 2024 Aug 16;6(3):lqae106. doi: 10.1093/nargab/lqae106. eCollection 2024 Sep.
8
Rational Approach toward COVID-19's Main Protease Inhibitors: A Hierarchical Biochemoinformatics Analysis.理性看待 COVID-19 的主要蛋白酶抑制剂:层次化的生物化学信息学分析。
Int J Mol Sci. 2024 Jun 18;25(12):6715. doi: 10.3390/ijms25126715.
9
Identification, classification, and characterization of alpha and beta subunits of LVP1 protein from the venom gland of four Iranian scorpion species.从四种伊朗蝎子毒液腺中鉴定、分类和表征 LVP1 蛋白的 alpha 和 beta 亚基。
Sci Rep. 2023 Dec 14;13(1):22277. doi: 10.1038/s41598-023-49556-6.
10
GraphPart: homology partitioning for biological sequence analysis.GraphPart:用于生物序列分析的同源性划分
NAR Genom Bioinform. 2023 Oct 16;5(4):lqad088. doi: 10.1093/nargab/lqad088. eCollection 2023 Dec.