Suppr超能文献

GOChase-II:纠正基于基因本体论注释的基因产物中的语义不一致性。

GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products.

机构信息

Seoul National University Biomedical Informatics, Div of Biomedical Informatics, Seoul National University College of medicine, Seoul 110799, Korea.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40.

Abstract

BACKGROUND

The Gene Ontology (GO) provides a controlled vocabulary for describing genes and gene products. In spite of the undoubted importance of GO, several drawbacks associated with GO and GO-based annotations have been introduced. We identified three types of semantic inconsistencies in GO-based annotations; semantically redundant, biological-domain inconsistent and taxonomy inconsistent annotations.

METHODS

To determine the semantic inconsistencies in GO annotation, we used the hierarchical structure of GO graph and tree structure of NCBI taxonomy. Twenty seven biological databases were collected for finding semantic inconsistent annotation.

RESULTS

The distributions and possible causes of the semantic inconsistencies were investigated using twenty seven biological databases with GO-based annotations. We found that some evidence codes of annotation were associated with the inconsistencies. The numbers of gene products and species in a database that are related to the complexity of database management are also in correlation with the inconsistencies. Consequently, numerous annotation errors arise and are propagated throughout biological databases and GO-based high-level analyses. GOChase-II is developed to detect and correct both syntactic and semantic errors in GO-based annotations.

CONCLUSIONS

We identified some inconsistencies in GO-based annotation and provided software, GOChase-II, for correcting these semantic inconsistencies in addition to the previous corrections for the syntactic errors by GOChase-I.

摘要

背景

基因本体论 (GO) 为描述基因和基因产物提供了一个受控词汇表。尽管 GO 无疑具有重要意义,但 GO 及其基于 GO 的注释带来了一些缺点。我们在基于 GO 的注释中发现了三种类型的语义不一致:语义冗余、生物域不一致和分类法不一致的注释。

方法

为了确定 GO 注释中的语义不一致,我们使用了 GO 图的层次结构和 NCBI 分类法的树状结构。收集了 27 个生物数据库来查找语义不一致的注释。

结果

使用具有基于 GO 的注释的 27 个生物数据库调查了语义不一致的分布和可能的原因。我们发现一些注释的证据代码与不一致有关。数据库中与数据库管理的复杂性相关的基因产物和物种数量也与不一致有关。因此,大量注释错误会在生物数据库和基于 GO 的高级分析中产生并传播。GOChase-II 是为了检测和纠正基于 GO 的注释中的语法和语义错误而开发的。

结论

我们在基于 GO 的注释中发现了一些不一致,并提供了软件 GOChase-II,除了 GOChase-I 之前对语法错误的纠正外,还可以纠正这些语义不一致。

相似文献

本文引用的文献

4
The Gene Ontology in 2010: extensions and refinements.2010 年的基因本体论:扩展和改进。
Nucleic Acids Res. 2010 Jan;38(Database issue):D331-5. doi: 10.1093/nar/gkp1018. Epub 2009 Nov 17.
5
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2010 Jan;38(Database issue):D5-16. doi: 10.1093/nar/gkp967. Epub 2009 Nov 12.
6
OBO-Edit--an ontology editor for biologists.OBO-Edit——一款面向生物学家的本体编辑器。
Bioinformatics. 2007 Aug 15;23(16):2198-200. doi: 10.1093/bioinformatics/btm112. Epub 2007 Jun 1.
10
A procedure for assessing GO annotation consistency.一种评估基因本体(GO)注释一致性的程序。
Bioinformatics. 2005 Jun;21 Suppl 1:i136-43. doi: 10.1093/bioinformatics/bti1019.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验