Stanley Institute for Cognitive Genomic, Cold Spring Harbor Laboratory, 196 Genome Research Center, 500 Sunnyside Boulevard Woodbury, NY 11797, USA.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S15. doi: 10.1186/1471-2105-14-s3-s15.
The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward.
基因功能的分配仍然是计算生物学中一项具有挑战性但很重要的任务。首次进行关键功能注释评估(CAFA)的目的是为了增加该领域的进展。我们对 CAFA 的结果进行了独立分析,旨在确定评估中的挑战,并了解预测性能的趋势。我们发现,基于序列相似性的被广泛接受的方法(即 BLAST)具有主导作用。许多最具信息量的预测结果要么是恢复了关于序列相似性的现有知识,要么是已经在文献中记录的“后预测”。这些结果表明,即使在定义功能分配任务方面,仍然存在深刻的挑战,特别是在以不依赖有缺陷的黄金标准或输入数据本身的方式定义功能方面存在困难。特别是,我们建议将基因本体论(或其他类似的功能分类系统)作为黄金标准不太可能是前进的方向。