Seoul National University Biomedical Informatics, Div of Biomedical Informatics, Seoul National University College of medicine, Seoul 110799, Korea.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40.
The Gene Ontology (GO) provides a controlled vocabulary for describing genes and gene products. In spite of the undoubted importance of GO, several drawbacks associated with GO and GO-based annotations have been introduced. We identified three types of semantic inconsistencies in GO-based annotations; semantically redundant, biological-domain inconsistent and taxonomy inconsistent annotations.
To determine the semantic inconsistencies in GO annotation, we used the hierarchical structure of GO graph and tree structure of NCBI taxonomy. Twenty seven biological databases were collected for finding semantic inconsistent annotation.
The distributions and possible causes of the semantic inconsistencies were investigated using twenty seven biological databases with GO-based annotations. We found that some evidence codes of annotation were associated with the inconsistencies. The numbers of gene products and species in a database that are related to the complexity of database management are also in correlation with the inconsistencies. Consequently, numerous annotation errors arise and are propagated throughout biological databases and GO-based high-level analyses. GOChase-II is developed to detect and correct both syntactic and semantic errors in GO-based annotations.
We identified some inconsistencies in GO-based annotation and provided software, GOChase-II, for correcting these semantic inconsistencies in addition to the previous corrections for the syntactic errors by GOChase-I.
基因本体论 (GO) 为描述基因和基因产物提供了一个受控词汇表。尽管 GO 无疑具有重要意义,但 GO 及其基于 GO 的注释带来了一些缺点。我们在基于 GO 的注释中发现了三种类型的语义不一致:语义冗余、生物域不一致和分类法不一致的注释。
为了确定 GO 注释中的语义不一致,我们使用了 GO 图的层次结构和 NCBI 分类法的树状结构。收集了 27 个生物数据库来查找语义不一致的注释。
使用具有基于 GO 的注释的 27 个生物数据库调查了语义不一致的分布和可能的原因。我们发现一些注释的证据代码与不一致有关。数据库中与数据库管理的复杂性相关的基因产物和物种数量也与不一致有关。因此,大量注释错误会在生物数据库和基于 GO 的高级分析中产生并传播。GOChase-II 是为了检测和纠正基于 GO 的注释中的语法和语义错误而开发的。
我们在基于 GO 的注释中发现了一些不一致,并提供了软件 GOChase-II,除了 GOChase-I 之前对语法错误的纠正外,还可以纠正这些语义不一致。