Kristensen David M, Ward R Matthew, Lisewski Andreas Martin, Erdin Serkan, Chen Brian Y, Fofanov Viacheslav Y, Kimmel Marek, Kavraki Lydia E, Lichtarge Olivier
Department of Molecular and Human Genetics, Biophysics, Baylor College of Medicine, Houston, TX 77030, USA.
BMC Bioinformatics. 2008 Jan 11;9:17. doi: 10.1186/1471-2105-9-17.
Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates.
Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable.
These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
诸如蛋白质结构计划(PSI)之类的结构基因组学项目产生了许多新结构,但这些结构通常没有已知的分子功能。一种恢复此信息的方法是使用3D模板——由一些功能关键氨基酸组成的结构-功能基序,当与其他结构进行几何匹配时可能暗示功能相似性。由于实验确定的功能位点不够常见,无法大规模定义3D模板,因此这项工作测试了一种计算策略来选择3D模板的相关残基。
基于进化信息和启发式方法,一个进化追踪注释(ETA)流程为98种酶构建了模板,其中一半取自PSI,并在一个非冗余结构数据库中寻找匹配项。平均每个模板匹配2.7种不同的蛋白质,其中2.0种与模板来源酶共享前三位酶委员会编号。在许多情况下(61%),可以预测出单一最可能的功能作为匹配最多的注释,在这些情况下,这种多数投票确定正确功能的准确率为87%。还发现ETA与基于序列同源性的注释互补。当要求匹配项既在几何上与3D模板匹配,又要是通过BLAST或PSI-BLAST找到的序列同源物时,注释准确率高于单独使用任何一种方法,特别是在序列同一性较低的区域,基于同源性的注释最不可靠。
这些数据表明,对进化上重要残基的了解可改善远缘酶同源物之间的功能注释。由于与其他3D模板方法不同,ETA方法无需催化机制的实验知识,因此它应被证明是一种有用的、大规模的通用辅助方法,可与其他方法结合以解读结构蛋白质组中的蛋白质功能。