Suppr超能文献

同源性衍生蛋白质结构数据库及序列比对的结构意义

Database of homology-derived protein structures and the structural meaning of sequence alignment.

作者信息

Sander C, Schneider R

机构信息

European Molecular Biology Laboratory, Heidelberg, Federal Republic of Germany.

出版信息

Proteins. 1991;9(1):56-68. doi: 10.1002/prot.340090107.

Abstract

The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.

摘要

基于以下观察结果,利用序列同源性可显著增加已知蛋白质三维结构的数据库。(1)已知序列的数据库目前包含超过12000种蛋白质,比已知结构的数据库大两个数量级。(2)目前预测蛋白质结构最强大的方法是同源性建模。(3)结构同源性可从序列相似性水平推断出来。(4)足以实现结构同源性的序列相似性阈值在很大程度上取决于比对的长度。在此,我们首先通过对已知结构蛋白质之间的比对进行详尽调查,量化序列相似性、结构相似性和比对长度之间的关系,并报告作为比对长度函数的同源性阈值曲线。然后,我们通过根据阈值曲线将所有被认为同源的序列与每个已知结构的蛋白质进行比对,生成一个蛋白质同源性衍生二级结构(HSSP)数据库。对于每个已知的蛋白质结构,衍生数据库包含比对序列、二级结构、序列变异性和序列概况。比对序列的三级结构是隐含的,但未明确建模。该数据库有效地将已知蛋白质结构的数量增加了五倍,超过1800种。这些结果可能有助于评估序列数据库搜索中匹配的结构意义,推导结构预测的偏好和模式,阐明保守残基的结构作用,以及通过同源性对三维细节进行建模。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验