Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-14-S3-S6. Epub 2013 Feb 28.
Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.
Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.
We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.
在结构基因组学中,以高精度和灵敏度注释蛋白质功能仍然是一个主要挑战。一种经过验证的计算策略是将少数关键功能氨基酸分组到模板中,并在其他蛋白质结构中搜索这些模板,以便在找到匹配时转移功能。为此,我们之前开发了进化追踪注释 (ETA),并表明在结构基因组规模上通过模板匹配网络扩散已知注释可以提高功能预测的准确性。为了进一步提高灵敏度,我们现在允许每个蛋白质贡献多个模板,而不仅仅是一个模板,并且还允许模板大小变化。
在 605 个结构基因组酶的回顾性基准测试中,即使在保持准确率超过 91%的情况下,与单模板预测相结合,多个模板的灵敏度最高可提高 14%。在单模板和多模板匹配的网络上全局扩散功能略微提高了 ROC 曲线下的面积,超过 0.97,但在 ETA 无法注释的蛋白质亚集中,网络方法以 100%的准确率恢复了最可信的 20-23 个 91 个案例中的注释。
我们通过在构建 ETA 匹配网络和扩散注释时为每个蛋白质结构使用多个模板来提高预测的准确性和灵敏度。