Knowledge Management in Bioinformatics, Humboldt-University Berlin, Unter den Linden 6, 10099 Berlin, Germany.
BMC Bioinformatics. 2008 Jul 22;9 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-9-S8-S2.
Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature.
Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision) according to the verifications from a trained curator.
Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.
蛋白质的功能注释仍然是一项具有挑战性的任务。目前,科学文献是未经注释的功能的主要来源,但注释工作既缓慢又昂贵。支持这项工作的自动技术仍然缺乏可靠性。我们开发了一种方法来识别保守的蛋白质相互作用图,并从这些图中的直系同源物预测缺失的蛋白质功能。为了提高结果的准确性,我们还实施了一种程序,该程序基于文献中报告的发现来验证所有预测。
使用此过程,可以自动验证在 UniProtKb/Swiss-Prot 中可用的具有高度保守直系同源物的蛋白质的 GO 注释的 80%以上。对于蛋白质的子集,我们预测了在 UniProtKb/Swiss-Prot 中不可用的新 GO 注释。根据经过训练的注释员的验证,所有预测均正确(100%的精度)。
因此,我们将 CCS 与文献挖掘相结合的方法是一种高度可靠的方法,可以预测具有直系同源物的弱表征蛋白质的 GO 注释。