Parasuram Ramya, Mills Caitlyn L, Wang Zhouxi, Somasundaram Saroja, Beuning Penny J, Ondrechen Mary Jo
Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.
Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.
Methods. 2016 Jan 15;93:51-63. doi: 10.1016/j.ymeth.2015.11.010. Epub 2015 Nov 10.
Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural similarity at the predicted active site with the known members of the GH16 family, with the closest match to the endoglucanase subfamily. The method discussed herein can predict whether an SG protein is correctly or incorrectly annotated and can sometimes provide a reliable functional annotation. Examples of application of the method across folds, comparing active sites between two proteins of different structural folds, are also given.
由于结构基因组学(SG)项目开发的高通量结构测定技术,已报道了数千种功能未知或不确定的蛋白质结构。然而,蛋白质数据库(PDB)中这些SG蛋白质的许多假定功能注释是错误的。虽然高通量生化筛选技术为有限的SG蛋白质组提供了有价值的功能信息,但大多数SG蛋白质的生化功能仍然未知或不确定。因此,从结构可靠预测蛋白质功能的计算方法可以为现有的SG数据增添巨大价值。在本文中,我们通过六发夹糖苷酶(6-HG)和伴刀豆球蛋白A样凝集素/葡聚糖酶(CAL/G)超家族的例子,展示了如何使用计算方法预测SG蛋白质的功能。利用从每个蛋白质结构的计算静电和化学性质获得的一组预测功能残基,表明这些超家族可以根据生化功能分类为功能家族。在这些超家族中,根据其预测的局部功能位点分析了总共18种SG蛋白质:13种来自6-HG超家族,5种来自CAL/G超家族。在6-HG超家族中,来自卵形拟杆菌的未表征蛋白质BACOVA_03626(PDB 3ON6)和来自嗜热栖热菌的假定蛋白质BT3781(PDB 2P0V)与外切-α-1,6-甘露糖苷酶具有非常强的活性位点匹配,因此可能具有该功能。同样在这个超家族中,来自嗜盐芽孢杆菌的假定糖苷水解酶蛋白质BH0842(PDB 2RDY)的预测活性位点与已知的α-L-半乳糖苷酶匹配良好。在CAL/G超家族中,来自耻垢分枝杆菌的未表征糖基水解酶家族16蛋白质(PDB 3RQ0)在预测活性位点与GH16家族的已知成员具有局部结构相似性,与内切葡聚糖酶亚家族的匹配度最高。本文讨论的方法可以预测SG蛋白质的注释是否正确,有时还可以提供可靠的功能注释。还给出了该方法跨折叠应用的例子,比较了不同结构折叠的两种蛋白质之间的活性位点。