Iliopoulos Ioannis, Tsoka Sophia, Andrade Miguel A, Enright Anton J, Carroll Mark, Poullet Patrick, Promponas Vassilis, Liakopoulos Theodore, Palaios Giorgos, Pasquier Claude, Hamodrakas Stavros, Tamames Javier, Yagnik Asutosh T, Tramontano Anna, Devos Damien, Blaschke Christian, Valencia Alfonso, Brett David, Martin David, Leroy Christophe, Rigoutsos Isidore, Sander Chris, Ouzounis Christos A
Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
Bioinformatics. 2003 Apr 12;19(6):717-26. doi: 10.1093/bioinformatics/btg077.
Genome-wide functional annotation either by manual or automatic means has raised considerable concerns regarding the accuracy of assignments and the reproducibility of methodologies. In addition, a performance evaluation of automated systems that attempt to tackle sequence analyses rapidly and reproducibly is generally missing. In order to quantify the accuracy and reproducibility of function assignments on a genome-wide scale, we have re-annotated the entire genome sequence of Chlamydia trachomatis (serovar D), in a collaborative manner.
We have encoded all annotations in a structured format to allow further comparison and data exchange and have used a scale that records the different levels of potential annotation errors according to their propensity to propagate in the database due to transitive function assignments. We conclude that genome annotation may entail a considerable amount of errors, ranging from simple typographical errors to complex sequence analysis problems. The most surprising result of this comparative study is that automatic systems might perform as well as the teams of experts annotating genome sequences.
通过手动或自动方式进行的全基因组功能注释引发了人们对注释准确性和方法可重复性的诸多担忧。此外,对于试图快速且可重复地进行序列分析的自动化系统,通常缺少性能评估。为了在全基因组范围内量化功能注释的准确性和可重复性,我们以协作方式对沙眼衣原体(血清型D)的全基因组序列进行了重新注释。
我们以结构化格式对所有注释进行编码,以便进一步比较和数据交换,并使用了一种量表,该量表根据因传递性功能注释而在数据库中传播的可能性记录不同程度的潜在注释错误。我们得出结论,基因组注释可能存在大量错误,从简单的排版错误到复杂的序列分析问题不等。这项比较研究最令人惊讶的结果是,自动化系统的表现可能与注释基因组序列的专家团队相当。