Suppr超能文献

挖掘 GO 注释以提高注释一致性。

Mining GO annotations for improving annotation consistency.

机构信息

Department of Informatics, Faculty of Sciences, University of Lisbon, Lisbon, Portugal.

出版信息

PLoS One. 2012;7(7):e40519. doi: 10.1371/journal.pone.0040519. Epub 2012 Jul 25.

Abstract

Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.

摘要

尽管基因本体论 (GO) 提供了结构和客观性,但蛋白质的注释是一项复杂的任务,容易出现错误和不一致。特别是电子推断的注释被广泛认为是不可靠的。然而,鉴于对所有 GO 注释进行手动整理是不可行的,因此必须提高电子推断注释的质量。在这项工作中,我们分析了 UniProtKB 蛋白质的完整 GO 分子功能注释,并讨论了一些影响其质量的问题,特别是缺乏注释一致性的问题。根据我们的分析,我们估计 64%的 UniProtKB 蛋白质没有得到完整注释,并且不一致的注释会影响 83%的蛋白质功能和至少 23%的蛋白质。此外,我们提出并评估了一种基于关联规则学习方法的数据挖掘算法,用于识别分子功能术语之间的隐含关系。该算法的目标是帮助 GO 注释人员更新 GO,并纠正和防止不一致的注释。我们的算法预测了 501 种关系,估计精度为 94%,而基本的关联规则学习方法预测了 12352 种关系,精度低于 9%。

相似文献

1
Mining GO annotations for improving annotation consistency.挖掘 GO 注释以提高注释一致性。
PLoS One. 2012;7(7):e40519. doi: 10.1371/journal.pone.0040519. Epub 2012 Jul 25.
2
Quality of computationally inferred gene ontology annotations.计算推断的基因本体论注释的质量。
PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.
6
The UniProt-GO Annotation database in 2011.2011 年的 UniProt-GO Annotation 数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D565-70. doi: 10.1093/nar/gkr1048. Epub 2011 Nov 28.
7
The GOA database: gene Ontology annotation updates for 2015.基因本体注释数据库(GOA):2015年基因本体注释更新
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. doi: 10.1093/nar/gku1113. Epub 2014 Nov 6.

引用本文的文献

3
GSAn: an alternative to enrichment analysis for annotating gene sets.GSAn:一种用于注释基因集的富集分析替代方法。
NAR Genom Bioinform. 2020 Mar 14;2(2):lqaa017. doi: 10.1093/nargab/lqaa017. eCollection 2020 Jun.

本文引用的文献

1
How the gene ontology evolves.基因本体论的演变。
BMC Bioinformatics. 2011 Aug 5;12:325. doi: 10.1186/1471-2105-12-325.
2
BRENDA, the enzyme information system in 2011.布伦达,2011年的酶信息系统。
Nucleic Acids Res. 2011 Jan;39(Database issue):D670-6. doi: 10.1093/nar/gkq1089. Epub 2010 Nov 9.
3
Cross-product extensions of the Gene Ontology.基因本体论的叉积扩展。
J Biomed Inform. 2011 Feb;44(1):80-6. doi: 10.1016/j.jbi.2010.02.002. Epub 2010 Feb 10.
4
Ontology engineering.本体工程。
Nat Biotechnol. 2010 Feb;28(2):128-30. doi: 10.1038/nbt0210-128.
5
FunSimMat update: new features for exploring functional similarity.FunSimMat 更新:探索功能相似性的新功能。
Nucleic Acids Res. 2010 Jan;38(Database issue):D244-8. doi: 10.1093/nar/gkp979. Epub 2009 Nov 18.
6
The Universal Protein Resource (UniProt) in 2010.2010 年的通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.
9
InterPro: the integrative protein signature database.InterPro:综合蛋白质特征数据库。
Nucleic Acids Res. 2009 Jan;37(Database issue):D211-5. doi: 10.1093/nar/gkn785. Epub 2008 Oct 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验