Dudev Minko, Lim Carmay
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan.
BMC Bioinformatics. 2007 Mar 28;8:106. doi: 10.1186/1471-2105-8-106.
For many metalloproteins, sequence motifs characteristic of metal-binding sites have not been found or are so short that they would not be expected to be metal-specific. Striking examples of such metalloproteins are those containing Mg2+, one of the most versatile metal cofactors in cellular biochemistry. Even when Mg2+-proteins share insufficient sequence homology to identify Mg2+-specific sequence motifs, they may still share similarity in the Mg2+-binding site structure. However, no structural motifs characteristic of Mg2+-binding sites have been reported. Thus, our aims are (i) to develop a general method for discovering structural patterns/motifs characteristic of ligand-binding sites, given the 3D protein structures, and (ii) to apply it to Mg2+-proteins sharing <30% sequence identity. Our motif discovery method employs structural alphabet encoding to convert 3D structures to the corresponding 1D structural letter sequences, where the Mg2+-structural motifs are identified as recurring structural patterns.
The structural alphabet-based motif discovery method has revealed the structural preference of Mg2+-binding sites for certain local/secondary structures: compared to all residues in the Mg2+-proteins, both first and second-shell Mg2+-ligands prefer loops to helices. Even when the Mg2+-proteins share no significant sequence homology, some of them share a similar Mg2+-binding site structure: 4 Mg2+-structural motifs, comprising 21% of the binding sites, were found. In particular, one of the Mg2+-structural motifs found maps to a specific functional group, namely, hydrolases. Furthermore, 2 of the motifs were not found in non metalloproteins or in Ca2+-binding proteins. The structural motifs discovered thus capture some essential biochemical and/or evolutionary properties, and hence may be useful for discovering proteins where Mg2+ plays an important biological role.
The structural motif discovery method presented herein is general and can be applied to any set of proteins with known 3D structures. This new method is timely considering the increasing number of structures for proteins with unknown function that are being solved from structural genomics incentives. For such proteins, which share no significant sequence homology to proteins of known function, the presence of a structural motif that maps to a specific protein function in the structure would suggest likely active/binding sites and a particular biological function.
对于许多金属蛋白而言,尚未发现金属结合位点特有的序列基序,或者这些基序非常短,以至于无法预期它们具有金属特异性。这类金属蛋白的显著例子是那些含有Mg2+的蛋白,Mg2+是细胞生物化学中最具通用性的金属辅因子之一。即使Mg2+蛋白之间的序列同源性不足以识别Mg2+特异性序列基序,它们在Mg2+结合位点结构上仍可能具有相似性。然而,尚未报道过Mg2+结合位点特有的结构基序。因此,我们的目标是:(i)基于蛋白质的三维结构,开发一种发现配体结合位点特有的结构模式/基序的通用方法;(ii)将其应用于序列一致性小于30%的Mg2+蛋白。我们的基序发现方法采用结构字母编码,将三维结构转换为相应的一维结构字母序列,其中Mg2+结构基序被识别为重复出现的结构模式。
基于结构字母的基序发现方法揭示了Mg+结合位点对某些局部/二级结构的结构偏好:与Mg2+蛋白中的所有残基相比,第一壳层和第二壳层的Mg2+配体都更倾向于环结构而非螺旋结构。即使Mg2+蛋白之间没有显著的序列同源性,其中一些蛋白仍具有相似的Mg2+结合位点结构:发现了4种Mg2+结构基序,占结合位点的21%。特别地,所发现的一种Mg2+结构基序对应于一个特定的功能基团,即水解酶。此外,在非金属蛋白或Ca2+结合蛋白中未发现其中2种基序。因此,所发现的结构基序捕捉到了一些基本的生化和/或进化特性,可能有助于发现Mg2+发挥重要生物学作用的蛋白质。
本文提出的结构基序发现方法具有通用性,可应用于任何已知三维结构的蛋白质组。考虑到结构基因组学计划解析出的功能未知蛋白质的结构数量不断增加,这种新方法正逢其时。对于那些与已知功能蛋白质没有显著序列同源性的蛋白质,结构中存在对应特定蛋白质功能的结构基序可能意味着潜在的活性/结合位点以及特定的生物学功能。