用于蛋白质数据库搜索的快速结构比对

Fast structure alignment for protein databank searching.

作者信息

Orengo C A, Brown N P, Taylor W R

机构信息

National Institute for Medical Research, London, England.

出版信息

Proteins. 1992 Oct;14(2):139-67. doi: 10.1002/prot.340140203.

DOI:10.1002/prot.340140203

PMID:1409565

Abstract

A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming. Linear representations of secondary structures are derived and their features compared to identify equivalent elements in two proteins. The secondary structure alignment then constrains the residue alignment, which compares only residues within aligned secondary structures and with similar buried areas and torsional angles. The initial secondary structure alignment improves accuracy and provides a means of filtering out unrelated proteins before the slower residue alignment stage. It is possible to search or sort the protein structure databank very quickly using just secondary structure comparisons. A search through 720 structures with a probe protein of 10 secondary structures required 1.7 CPU hours on a Sun 4/280. Alternatively, combined secondary structure and residue alignments, with a cutoff on the secondary structure score to remove pairs of unrelated proteins from further analysis, took 10.1 CPU hours. The method was applied in searches on different classes of proteins and to cluster a subset of the databank into structurally related groups. Relationships were consistent with known families of protein structure.

摘要

本文描述了一种用于搜索和分析蛋白质结构数据库的快速方法。该方法利用二级结构，随后进行残基匹配来比较蛋白质结构，它是在先前基于动态规划的结构比对方法基础上发展而来的。推导二级结构的线性表示形式，并比较它们的特征以识别两种蛋白质中的等效元件。二级结构比对随后限制残基比对，后者仅比较比对的二级结构内且具有相似埋藏区域和扭转角的残基。初始的二级结构比对提高了准确性，并提供了一种在较慢的残基比对阶段之前滤除不相关蛋白质的方法。仅使用二级结构比较就可以非常快速地搜索或排序蛋白质结构数据库。在Sun 4/280计算机上，用具有10个二级结构的探针蛋白搜索720个结构需要1.7个CPU小时。或者，结合二级结构和残基比对，并对二级结构得分设置截止值以从进一步分析中去除不相关蛋白质对，这需要10.1个CPU小时。该方法被应用于对不同类别的蛋白质进行搜索，并将数据库的一个子集聚类成结构相关的组。所得关系与已知的蛋白质结构家族一致。