Aroul-Selvam R, Hubbard Tim, Sasidharan Rajkumar
The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
J Mol Biol. 2004 May 7;338(4):633-41. doi: 10.1016/j.jmb.2004.03.039.
Domains are the structural, functional or evolutionary units of proteins. Proteins can comprise a single domain or a combination of domains. In multi-domain proteins, the domains almost always occur end-to-end, i.e., one domain follows the C-terminal end of another domain. However, there are exceptions to this common pattern, where multi-domain proteins are formed by insertion of one domain (insert) into another domain (parent). Here, we provide a quantitative description of known insertions in the Protein Data Bank (PDB). We found that 9% of domain combinations observed in non-redundant PDB are insertions. Although 90% of all insertions involve only one insert, proteins can clearly have multiple (nested, two-domain and three-domain) inserts. We also observed correlations between the structure and function of a domain and its tendency to be found as a parent or an insert. There is a bias in insert position towards the C terminus of parents. We observed that the atomic distance between the N and C terminus of an insert is significantly smaller when compared to the N-to-C distance in a parent context or a single domain context. Insertions are found always to occur in loop regions of parent domains. Our observations regarding the relationship between domain insertions and the structure, function and evolution of proteins have implications for protein engineering.
结构域是蛋白质的结构、功能或进化单位。蛋白质可以由单个结构域或多个结构域组合而成。在多结构域蛋白质中,结构域几乎总是首尾相连,即一个结构域紧跟在另一个结构域的C末端之后。然而,这种常见模式也有例外,即多结构域蛋白质是通过将一个结构域(插入结构域)插入另一个结构域(母体结构域)而形成的。在此,我们对蛋白质数据库(PDB)中已知的插入情况进行了定量描述。我们发现,在非冗余PDB中观察到的结构域组合中有9%是插入情况。虽然所有插入情况中有90%只涉及一个插入结构域,但蛋白质显然可以有多个(嵌套的、双结构域和三结构域)插入结构域。我们还观察到一个结构域的结构和功能与其作为母体或插入结构域被发现的倾向之间存在相关性。插入位置偏向于母体结构域的C末端。我们观察到,与母体结构域或单个结构域背景下的N端到C端距离相比,插入结构域的N端和C端之间的原子距离明显更小。插入情况总是发生在母体结构域的环区。我们关于结构域插入与蛋白质结构、功能和进化之间关系的观察结果对蛋白质工程具有启示意义。