School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA.
Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
蛋白质功能的自动注释具有挑战性。随着测序基因组数量的快速增长,绝大多数蛋白质产物只能通过计算进行注释。如果要依赖于计算预测,那么这些方法的准确性就至关重要。本文报告了首次大规模基于社区的蛋白质功能注释(CAFA)实验的结果。54 种方法代表了蛋白质功能预测的最新技术水平,它们在来自 11 个生物体的 866 个蛋白质目标集上进行了评估。有两个发现引人注目:(i)当今最好的蛋白质功能预测算法大大优于广泛使用的第一代方法,在所有类型的目标上都有显著提高;(ii)尽管顶级方法的表现足以指导实验,但目前可用工具仍有很大的改进空间。