Siddiqui A S, Barton G J
Laboratory of Molecular Biophysics, University of Oxford, United Kingdom.
Protein Sci. 1995 May;4(5):872-84. doi: 10.1002/pro.5560040507.
An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.
本文提出了一种算法,可在无需事先了解结构域数量或类型的情况下,根据坐标数据快速准确地定义蛋白质结构域。该算法能明确找出由蛋白质链的一个或两个连续片段组成的结构域,也能定位包含两个以上片段的结构域。该算法应用于一个包含230个蛋白质结构的非冗余数据库,并将结果与从文献中获取的结构域定义或通过检查分子图形坐标得到的定义进行比较。对于70%的蛋白质,推导得到的结构域与参考定义相符,18%的蛋白质显示出微小差异,只有12%(28种蛋白质)的蛋白质显示出非常不同的定义。应用了三个筛选标准来识别最不可能与主观定义集相符的推导结构域。这些筛选标准揭示了一组173种蛋白质,其中97%与主观定义相符。该算法是一种实用的结构域识别工具,可在整个结构数据库上常规运行。参数调整还能使在蛋白质中识别出更小的紧密单元。