Zhang L, Godzik A, Skolnick J, Fetrow J S
Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
Fold Des. 1998;3(6):535-48. doi: 10.1016/s1359-0278(98)00069-8.
Database-searching methods based on sequence similarity have become the most commonly used tools for characterizing newly sequenced proteins. Due to the often underestimated functional diversity in protein families and superfamilies, however, it is difficult to make the characterization specific and accurate. In this work, we have extended a method for active-site identification from predicted protein structures.
The structural conservation and variation of the active sites of the alpha/beta hydrolases with known structures were studied. The similarities were incorporated into a three-dimensional motif that specifies essential requirements for the enzymatic functions. A threading algorithm was used to align 651 Escherichia coli open reading frames (ORFs) to one of the members of the alpha/beta hydrolase fold family. These ORFs were then screened according to our three-dimensional motif and with an extra requirement that demands conservation of the key active-site residues among the proteins that bear significant sequence similarity to the ORFs. 17 ORFs from E. coli were predicted to have hydrolase activity and their putative active-site residues were identified. Most were in agreement with the experiments and results of other database-searching methods. The study further suggests that YHET_ECOLI, a hypothetical protein classified as a member of the UPF0017 family (an uncharacterized protein family), bears all the hallmarks of the alpha/beta hydrolase family.
The novel feature of our method is that it uses three-dimensional structural information for function prediction. The results demonstrate the importance and necessity of such a method to fill the gap between sequence alignment and function prediction; furthermore, the method provides a way to verify the structure predictions, which enables an expansion of the applicable scope of the threading algorithms.
基于序列相似性的数据库搜索方法已成为表征新测序蛋白质最常用的工具。然而,由于蛋白质家族和超家族中的功能多样性常常被低估,因此难以使表征具体且准确。在这项工作中,我们扩展了一种从预测蛋白质结构中识别活性位点的方法。
研究了具有已知结构的α/β水解酶活性位点的结构保守性和变异性。这些相似性被纳入一个三维基序中,该基序规定了酶功能的基本要求。使用一种穿线算法将651个大肠杆菌开放阅读框(ORF)与α/β水解酶折叠家族的一个成员进行比对。然后根据我们的三维基序对这些ORF进行筛选,并额外要求与这些ORF具有显著序列相似性的蛋白质之间关键活性位点残基保守。预测来自大肠杆菌的17个ORF具有水解酶活性,并确定了它们的推定活性位点残基。大多数与实验结果和其他数据库搜索方法的结果一致。该研究进一步表明,YHET_ECOLI,一种被归类为UPF0017家族(一个未表征的蛋白质家族)成员的假设蛋白质,具有α/β水解酶家族的所有特征。
我们方法的新颖之处在于它使用三维结构信息进行功能预测。结果证明了这种方法在填补序列比对和功能预测之间差距方面的重要性和必要性;此外,该方法提供了一种验证结构预测的方法,从而能够扩大穿线算法的适用范围。