Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S13. doi: 10.1186/1471-2105-14-S3-S13. Epub 2013 Feb 28.
The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site.
Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase.
A local site match provides a more compelling function prediction than that obtainable from a simple 3D structure match. The present method can confirm putative annotations, identify misannotation, and in some cases suggest a more probable annotation.
从蛋白质的 3D 结构预测生化功能比最初预想的要困难得多。一种可靠的方法来测试假定注释的可能性,并从结构预测功能将为结构基因组学数据增添巨大的价值。我们报告了一种新的方法,即结构对齐局部活性位点(SALSA),用于基于预测的催化或结合位点的局部结构匹配来预测生化功能。
描述了 SALSA 方法的实现。对于来自恶性疟原虫的结构基因组学蛋白 PY01515(PDB ID 2aqw),表明假定的注释,乳清酸 5'-单磷酸脱羧酶(OMPDC)很可能是正确的。对 YP_001304206.1(PDB ID 3h3l)的 SALSA 分析表明,它是 Parabacteroides distasonis 的一种假定的糖水解酶,其活性位点与该超家族中任何先前表征的成员(伴刀豆球蛋白 A 样凝集素/葡聚糖酶)都没有密切相似之处。值得注意的是,嗜热β-1,4-木聚糖酶来自 Nonomuraea flexuosa(PDB ID 1m4w)的活性位点中的三个残基 Y78、E87 和 E176 与 POOL 预测的类似类型的残基 Y168、D153 和 E232 重叠,YP_001304206.1。这两种蛋白质的底物识别区域差异很大,表明 YP_001304206.1 是该超家族中的一个新功能类型。已报道分枝杆菌(Mycobacterium avium)的结构基因组学蛋白是烯酰辅酶 A 水合酶(ECH),但 SALSA 分析表明,SG 蛋白的预测残基与已知 ECH 之间的匹配较差。与已知的来自蓝藻(Anabaena anabaena)的β-二酮水解酶(ABDH)的更好的局部结构匹配更好地匹配,(PDB ID 2j5s)。这表明报告的 SG 蛋白的 ECH 功能是不正确的,它更可能是一种β-二酮水解酶。
局部位点匹配比从简单的 3D 结构匹配获得的功能预测更具说服力。本方法可以确认假定的注释,识别错误注释,并在某些情况下建议更可能的注释。