Ruđer Bošković Institute, Division of Electronics, Zagreb, Croatia.
PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon-an important outcome given that >98% of all annotations are inferred without direct curation.
基因本体论 (GO) 已成为蛋白质功能注释的无可争议的标准。大多数注释都是通过电子方式推断出来的,即没有单独的注释员监督,但它们被广泛认为是不可靠的。与此同时,我们又严重依赖这些自动注释,因为大多数新测序的基因组是非模式生物。在这里,我们引入了一种系统地和定量地评估电子注释的方法。通过利用 UniProt 基因本体论注释数据库的连续版本的变化,我们从特异性、可靠性和覆盖范围等方面评估了电子注释的质量。总的来说,我们不仅发现电子注释近年来有了显著的改进,而且它们的可靠性现在与注释员使用来自主要文献的实验以外的证据进行推断时的可靠性相当。这项工作为识别可以信赖的电子注释子集提供了手段,鉴于超过 98%的注释都是在没有直接注释的情况下推断出来的,这是一个重要的结果。