Islam S A, Luo J, Sternberg M J
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, UK.
Protein Eng. 1995 Jun;8(6):513-25. doi: 10.1093/protein/8.6.513.
An automatic algorithm based on inter-residue contacts is presented to identify domains in proteins. The results of the algorithm are compared to an assignment performed by inspection that was guided by the authors' description in the literature. The authors' and the algorithm's assignments for a chain were considered to agree if the same number of domains were identified and if the assignments were the same for at least 95% of the residues. With this criterion, the algorithm agreed with the authors' assignment for 78% of the 284 non-redundant chains considered. When some of the authors' assignments were re-evaluated based on the results of the algorithm, an agreement of 84% was obtained. The algorithm is therefore a useful tool for data validation in domain assignment. The authors assignments of domains were analysed for structural principles of domains. The number of chains forming one, two, three, four and five domains are 197, 67, 13, 6 and 1 respectively. Most domains in multidomain proteins are formed from continuous segments and adopt the same structural class. Distributions of the number of residues and the ellipticity of domains and chains are presented. The relationship between accessible surface area and molecular weight for domains and chains is examined.
提出了一种基于残基间接触的自动算法来识别蛋白质中的结构域。将该算法的结果与通过检查进行的分配进行比较,该检查是在作者文献描述的指导下进行的。如果识别出的结构域数量相同,并且至少95%的残基的分配相同,则认为作者和算法对一条链的分配是一致的。根据这个标准,在考虑的284条非冗余链中,该算法与作者的分配一致的占78%。当根据算法结果重新评估作者的一些分配时,一致性达到了84%。因此,该算法是结构域分配中数据验证的有用工具。分析了作者对结构域的分配以了解结构域的结构原理。形成一个、两个、三个、四个和五个结构域的链的数量分别为197、67、13、6和1。多结构域蛋白质中的大多数结构域由连续片段形成,并采用相同的结构类别。给出了结构域和链的残基数量及椭圆率的分布。研究了结构域和链的可及表面积与分子量之间的关系。