Suppr超能文献

GOChase-II:纠正基于基因本体论注释的基因产物中的语义不一致性。

GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products.

机构信息

Seoul National University Biomedical Informatics, Div of Biomedical Informatics, Seoul National University College of medicine, Seoul 110799, Korea.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40.

Abstract

BACKGROUND

The Gene Ontology (GO) provides a controlled vocabulary for describing genes and gene products. In spite of the undoubted importance of GO, several drawbacks associated with GO and GO-based annotations have been introduced. We identified three types of semantic inconsistencies in GO-based annotations; semantically redundant, biological-domain inconsistent and taxonomy inconsistent annotations.

METHODS

To determine the semantic inconsistencies in GO annotation, we used the hierarchical structure of GO graph and tree structure of NCBI taxonomy. Twenty seven biological databases were collected for finding semantic inconsistent annotation.

RESULTS

The distributions and possible causes of the semantic inconsistencies were investigated using twenty seven biological databases with GO-based annotations. We found that some evidence codes of annotation were associated with the inconsistencies. The numbers of gene products and species in a database that are related to the complexity of database management are also in correlation with the inconsistencies. Consequently, numerous annotation errors arise and are propagated throughout biological databases and GO-based high-level analyses. GOChase-II is developed to detect and correct both syntactic and semantic errors in GO-based annotations.

CONCLUSIONS

We identified some inconsistencies in GO-based annotation and provided software, GOChase-II, for correcting these semantic inconsistencies in addition to the previous corrections for the syntactic errors by GOChase-I.

摘要

背景

基因本体论 (GO) 为描述基因和基因产物提供了一个受控词汇表。尽管 GO 无疑具有重要意义,但 GO 及其基于 GO 的注释带来了一些缺点。我们在基于 GO 的注释中发现了三种类型的语义不一致:语义冗余、生物域不一致和分类法不一致的注释。

方法

为了确定 GO 注释中的语义不一致,我们使用了 GO 图的层次结构和 NCBI 分类法的树状结构。收集了 27 个生物数据库来查找语义不一致的注释。

结果

使用具有基于 GO 的注释的 27 个生物数据库调查了语义不一致的分布和可能的原因。我们发现一些注释的证据代码与不一致有关。数据库中与数据库管理的复杂性相关的基因产物和物种数量也与不一致有关。因此,大量注释错误会在生物数据库和基于 GO 的高级分析中产生并传播。GOChase-II 是为了检测和纠正基于 GO 的注释中的语法和语义错误而开发的。

结论

我们在基于 GO 的注释中发现了一些不一致,并提供了软件 GOChase-II,除了 GOChase-I 之前对语法错误的纠正外,还可以纠正这些语义不一致。

相似文献

1
GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40.
2
GOChase: correcting errors from Gene Ontology-based annotations for gene products.
Bioinformatics. 2005 Mar;21(6):829-31. doi: 10.1093/bioinformatics/bti106. Epub 2004 Oct 28.
3
Interspecies gene function prediction using semantic similarity.
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):121. doi: 10.1186/s12918-016-0361-5.
4
A relation based measure of semantic similarity for Gene Ontology annotations.
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
5
IntelliGO: a new vector-based semantic similarity measure including annotation origin.
BMC Bioinformatics. 2010 Dec 1;11:588. doi: 10.1186/1471-2105-11-588.
7
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
8
Discovering gene annotations in biomedical text databases.
BMC Bioinformatics. 2008 Mar 6;9:143. doi: 10.1186/1471-2105-9-143.
9
DynGO: a tool for visualizing and mining of Gene Ontology and its associations.
BMC Bioinformatics. 2005 Aug 9;6:201. doi: 10.1186/1471-2105-6-201.
10
CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.
Database (Oxford). 2012 Mar 20;2012:bas001. doi: 10.1093/database/bas001. Print 2012.

引用本文的文献

1
A working taxonomy for describing the sensory differences of autism.
Mol Autism. 2023 Apr 11;14(1):15. doi: 10.1186/s13229-022-00534-1.
2
Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges.
Yearb Med Inform. 2015 Aug 13;10(1):125-33. doi: 10.15265/IY-2015-002.
3
Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).
Bioinformatics. 2014 Jun 15;30(12):1698-706. doi: 10.1093/bioinformatics/btu110. Epub 2014 Feb 25.
4
Measuring the evolution of ontology complexity: the gene ontology case study.
PLoS One. 2013 Oct 11;8(10):e75993. doi: 10.1371/journal.pone.0075993. eCollection 2013.
5
Evolutionary rate heterogeneity of core and attachment proteins in yeast protein complexes.
Genome Biol Evol. 2013;5(7):1366-75. doi: 10.1093/gbe/evt096.
6
The use of EST expression matrixes for the quality control of gene expression data.
PLoS One. 2012;7(3):e32966. doi: 10.1371/journal.pone.0032966. Epub 2012 Mar 8.

本文引用的文献

2
BisoGenet: a new tool for gene network building, visualization and analysis.
BMC Bioinformatics. 2010 Feb 17;11:91. doi: 10.1186/1471-2105-11-91.
3
An integrative modular approach to systematically predict gene-phenotype associations.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S62. doi: 10.1186/1471-2105-11-S1-S62.
4
The Gene Ontology in 2010: extensions and refinements.
Nucleic Acids Res. 2010 Jan;38(Database issue):D331-5. doi: 10.1093/nar/gkp1018. Epub 2009 Nov 17.
5
Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res. 2010 Jan;38(Database issue):D5-16. doi: 10.1093/nar/gkp967. Epub 2009 Nov 12.
6
OBO-Edit--an ontology editor for biologists.
Bioinformatics. 2007 Aug 15;23(16):2198-200. doi: 10.1093/bioinformatics/btm112. Epub 2007 Jun 1.
7
Tetrahymena Genome Database (TGD): a new genomic resource for Tetrahymena thermophila research.
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D500-3. doi: 10.1093/nar/gkj054.
8
Using Gene Ontology and genomic controlled vocabularies to analyze high-throughput gene lists: three tool comparison.
Comput Biol Med. 2006 Jul-Aug;36(7-8):731-47. doi: 10.1016/j.compbiomed.2005.04.008. Epub 2005 Sep 13.
9
Ontological analysis of gene expression data: current tools, limitations, and open problems.
Bioinformatics. 2005 Sep 15;21(18):3587-95. doi: 10.1093/bioinformatics/bti565. Epub 2005 Jun 30.
10
A procedure for assessing GO annotation consistency.
Bioinformatics. 2005 Jun;21 Suppl 1:i136-43. doi: 10.1093/bioinformatics/bti1019.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验