Friedberg Iddo
Burnham Institute for Medical Research, Program in Bioinformatics and Systems Biology, La Jolla, CA 92037, USA.
Brief Bioinform. 2006 Sep;7(3):225-42. doi: 10.1093/bib/bbl004. Epub 2006 May 23.
Overwhelmed with genomic data, biologists are facing the first big post-genomic question--what do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene products. Consequently, established methods of gene and protein annotation, such as homology-based transfer, are annotating less data and in many cases are amplifying existing erroneous annotation. Second, there is a need for a functional annotation which is standardized and machine readable so that function prediction programs could be incorporated into larger workflows. This is problematic due to the subjective and contextual definition of protein function. Third, there is a need to assess the quality of function predictors. Again, the subjectivity of the term 'function' and the various aspects of biological function make this a challenging effort. This article briefly outlines the history of automated protein function prediction and surveys the latest innovations in all three topics.
面对海量的基因组数据,生物学家们正面临着后基因组时代的首个重大问题——所有基因都有什么作用?首先,不仅纯序列和结构数据的量在增长,其多样性也在增加,导致未表征基因产物的数量出现不成比例的增长。因此,诸如基于同源性转移等既定的基因和蛋白质注释方法所注释的数据越来越少,而且在许多情况下还在放大现有的错误注释。其次,需要一种标准化且机器可读的功能注释,以便功能预测程序能够被纳入更大的工作流程中。由于蛋白质功能的定义具有主观性和上下文相关性,这一点存在问题。第三,需要评估功能预测器的质量。同样,“功能”一词的主观性以及生物功能的各个方面使得这成为一项具有挑战性的工作。本文简要概述了自动化蛋白质功能预测的历史,并对这三个主题的最新创新进行了综述。