Bashton Matthew, Nobeli Irene, Thornton Janet M
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
J Mol Biol. 2006 Dec 8;364(4):836-52. doi: 10.1016/j.jmb.2006.09.041. Epub 2006 Sep 20.
Here, we present an automatic assignment of potential cognate ligands to domains of enzymes in the CATH and SCOP protein domain classifications on the basis of structural data available in the wwPDB. This procedure involves two steps; firstly, we assign the binding of particular ligands to particular domains; secondly, we compare the chemical similarity of the PDB ligands to ligands in KEGG in order to assign cognate ligands. We find that use of the Enzyme Commission (EC) numbers is necessary to enable efficient and accurate cognate ligand assignment. The PROCOGNATE database currently has cognate ligand mapping for 3277 (4118) protein structures and 351 (302) superfamilies, as described by the CATH and (SCOP) databases, respectively. We find that just under half of all ligands are only and always bound by a single domain, with 16% bound by more than one domain and the remainder of the ligands showing a variety of binding modes. This finding has implications for domain recombination and the evolution of new protein functions. Domain architecture or context is also found to affect substrate specificity of particular domains, and we discuss example cases. The most popular PDB ligands are all found to be generic components of crystallisation buffers, highlighting the non-cognate ligand problem inherent in the PDB. In contrast, the most popular cognate ligands are all found to be universal cellular currencies of reducing power and energy such as NADH, FADH2 and ATP, respectively, reflecting the fact that the vast majority of enzymatic reactions utilise one of these popular co-factors. These ligands all share a common adenine ribonucleotide moiety, suggesting that many different domain superfamilies have converged to bind this chemical framework.
在此,我们基于wwPDB中可用的结构数据,针对CATH和SCOP蛋白质结构域分类中的酶结构域,提出了一种潜在同源配体的自动分配方法。该过程包括两个步骤:首先,我们将特定配体的结合分配到特定结构域;其次,我们比较PDB配体与KEGG中配体的化学相似性,以便分配同源配体。我们发现使用酶委员会(EC)编号对于实现高效准确的同源配体分配是必要的。目前,PROCOGNATE数据库分别具有针对CATH和(SCOP)数据库所描述的3277(4118)个蛋白质结构和351(302)个超家族的同源配体映射。我们发现,所有配体中略少于一半仅且始终由单个结构域结合,16%由多个结构域结合,其余配体呈现出多种结合模式。这一发现对结构域重组和新蛋白质功能的进化具有启示意义。还发现结构域架构或背景会影响特定结构域的底物特异性,我们讨论了一些示例情况。发现最常见的PDB配体都是结晶缓冲液的通用成分,凸显了PDB中固有的非同源配体问题。相比之下,最常见的同源配体分别被发现是还原力和能量的通用细胞货币,如NADH、FADH2和ATP,这反映了绝大多数酶促反应利用这些常见辅助因子之一的事实。这些配体都共享一个共同的腺嘌呤核糖核苷酸部分,表明许多不同的结构域超家族已经趋同于结合这个化学框架。