Kaminska Katarzyna H, Kawai Mikihiko, Boniecki Michal, Kobayashi Ichizo, Bujnicki Janusz M
Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Trojdena 4, 02-109 Warsaw, Poland.
BMC Struct Biol. 2008 Nov 14;8:48. doi: 10.1186/1472-6807-8-48.
Catalytic domains of Type II restriction endonucleases (REases) belong to a few unrelated three-dimensional folds. While the PD-(D/E)XK fold is most common among these enzymes, crystal structures have been also determined for single representatives of two other folds: PLD (R.BfiI) and half-pipe (R.PabI). Bioinformatics analyses supported by mutagenesis experiments suggested that some REases belong to the HNH fold (e.g. R.KpnI), and that a small group represented by R.Eco29kI belongs to the GIY-YIG fold. However, for a large fraction of REases with known sequences, the three-dimensional fold and the architecture of the active site remain unknown, mostly due to extreme sequence divergence that hampers detection of homology to enzymes with known folds.
R.Hpy188I is a Type II REase with unknown structure. PSI-BLAST searches of the non-redundant protein sequence database reveal only 1 homolog (R.HpyF17I, with nearly identical amino acid sequence and the same DNA sequence specificity). Standard application of state-of-the-art protein fold-recognition methods failed to predict the relationship of R.Hpy188I to proteins with known structure or to other protein families. In order to increase the amount of evolutionary information in the multiple sequence alignment, we have expanded our sequence database searches to include sequences from metagenomics projects. This search resulted in identification of 23 further members of R.Hpy188I family, both from metagenomics and the non-redundant database. Moreover, fold-recognition analysis of the extended R.Hpy188I family revealed its relationship to the GIY-YIG domain and allowed for computational modeling of the R.Hpy188I structure. Analysis of the R.Hpy188I model in the light of sequence conservation among its homologs revealed an unusual variant of the active site, in which the typical Tyr residue of the YIG half-motif had been substituted by a Lys residue. Moreover, some of its homologs have the otherwise invariant Arg residue in a non-homologous position in sequence that nonetheless allows for spatial conservation of the guanidino group potentially involved in phosphate binding.
The present study eliminates a significant "white spot" on the structural map of REases. It also provides important insight into sequence-structure-function relationships in the GIY-YIG nuclease superfamily. Our results reveal that in the case of proteins with no or few detectable homologs in the standard "non-redundant" database, it is useful to expand this database by adding the metagenomic sequences, which may provide evolutionary linkage to detect more remote homologs.
II型限制性内切酶(REases)的催化结构域属于少数不相关的三维折叠类型。虽然PD - (D/E)XK折叠在这些酶中最为常见,但也已确定了其他两种折叠类型的单个代表的晶体结构:PLD(R.BfiI)和半管型(R.PabI)。诱变实验支持的生物信息学分析表明,一些REases属于HNH折叠类型(例如R.KpnI),并且以R.Eco29kI为代表的一小类属于GIY - YIG折叠类型。然而,对于很大一部分具有已知序列的REases,其三维折叠和活性位点结构仍然未知,这主要是由于极端的序列差异阻碍了与具有已知折叠的酶的同源性检测。
R.Hpy188I是一种结构未知的II型REase。对非冗余蛋白质序列数据库进行PSI - BLAST搜索仅发现1个同源物(R.HpyF17I,其氨基酸序列几乎相同且DNA序列特异性相同)。最先进的蛋白质折叠识别方法的标准应用未能预测R.Hpy188I与具有已知结构的蛋白质或其他蛋白质家族之间的关系。为了增加多序列比对中的进化信息,我们将序列数据库搜索扩展到包括宏基因组学项目中的序列。该搜索结果鉴定出R.Hpy188I家族的另外23个成员,它们分别来自宏基因组学和非冗余数据库。此外,对扩展的R.Hpy188I家族进行的折叠识别分析揭示了其与GIY - YIG结构域的关系,并允许对R.Hpy188I结构进行计算建模。根据其同源物之间的序列保守性对R.Hpy188I模型进行分析,发现了一种不寻常的活性位点变体,其中YIG半基序的典型Tyr残基被Lys残基取代。此外,其一些同源物在序列中的非同源位置具有原本不变的Arg残基,尽管如此,这仍允许潜在参与磷酸盐结合的胍基在空间上保持保守。
本研究消除了REases结构图谱上的一个重要“空白点”。它还为GIY - YIG核酸酶超家族中的序列 - 结构 - 功能关系提供了重要见解。我们的结果表明,对于在标准“非冗余”数据库中没有或只有很少可检测同源物的蛋白质,通过添加宏基因组序列来扩展该数据库是有用的,这可能提供进化联系以检测更远的同源物。