Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G
European Molecular Biology Laboratory, Heidelberg, Germany.
Protein Sci. 1992 Dec;1(12):1691-8. doi: 10.1002/pro.5560011217.
The availability of fast and robust algorithms for protein structure comparison provides an opportunity to produce a database of three-dimensional comparisons, called families of structurally similar proteins (FSSP). The database currently contains an extended structural family for each of 154 representative (below 30% sequence identity) protein chains. Each data set contains: the search structure; all its relatives with 70-30% sequence identity, aligned structurally; and all other proteins from the representative set that contain substructures significantly similar to the search structure. Very close relatives (above 70% sequence identity) rarely have significant structural differences and are excluded. The alignments of remote relatives are the result of pairwise all-against-all structural comparisons in the set of 154 representative protein chains. The comparisons were carried out with each of three novel automatic algorithms that cover different aspects of protein structure similarity. The user of the database has the choice between strict rigid-body comparisons and comparisons that take into account interdomain motion or geometrical distortions; and, between comparisons that require strictly sequential ordering of segments and comparisons, which allow altered topology of loop connections or chain reversals. The data sets report the structurally equivalent residues in the form of a multiple alignment and as a list of matching fragments to facilitate inspection by three-dimensional graphics. If substructures are ignored, the result is a database of structure alignments of full-length proteins, including those in the twilight zone of sequence similarity.(ABSTRACT TRUNCATED AT 250 WORDS)
用于蛋白质结构比较的快速且可靠的算法,为创建一个三维比较数据库提供了契机,该数据库被称为结构相似蛋白质家族(FSSP)。目前,该数据库包含了154条代表性蛋白质链(序列同一性低于30%)中每一条的扩展结构家族。每个数据集包含:搜索结构;所有序列同一性在70%-30%之间且结构已比对的亲属序列;以及代表性集合中所有包含与搜索结构显著相似子结构的其他蛋白质。序列同一性高于70%的近亲很少有显著的结构差异,因此被排除在外。远亲的比对结果是154条代表性蛋白质链集合中两两全对全结构比较的结果。这些比较是使用三种新型自动算法分别进行的,它们涵盖了蛋白质结构相似性的不同方面。数据库用户可以在严格的刚体比较和考虑结构域间运动或几何畸变的比较之间进行选择;也可以在要求片段严格顺序排列的比较和允许环连接拓扑改变或链反转的比较之间进行选择。数据集以多重比对的形式以及匹配片段列表的形式报告结构上等效的残基,以便于通过三维图形进行检查。如果忽略子结构,结果就是一个全长蛋白质结构比对数据库,包括那些处于序列相似性模糊区域的蛋白质。(摘要截断于250字)