Suppr超能文献

利用分类群特异性比率比较检测基因本体论错误注释。

Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons.

机构信息

State Key Laboratory of Biotherapy and Cancer Center/Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.

Department of Computational Medicine and Bioinformatics.

出版信息

Bioinformatics. 2020 Aug 15;36(16):4383-4388. doi: 10.1093/bioinformatics/btaa548.

Abstract

MOTIVATION

Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases.

RESULTS

We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations by comparing the ratio of annotation rates (RAR) for the same GO term across different taxonomic groups, where those with a relatively low RAR usually correspond to incorrect annotations. As an illustration, we applied the approach to 20 commonly studied species in two recent UniProt-GOA releases and identified 250 potential misannotations in the 2018-11-6 release, where only 25% of them were corrected in the 2019-6-3 release. Importantly, 56% of the misannotations are 'Inferred from Biological aspect of Ancestor (IBA)' which is in contradiction with previous observations that attributed misannotations mainly to 'Inferred from Sequence or structural Similarity (ISS)', probably reflecting an error source shift due to the new developments of function annotation databases. The results demonstrated a simple but efficient misannotation detection approach that is useful for large-scale comparative protein function studies.

AVAILABILITY AND IMPLEMENTATION

https://zhanglab.ccmb.med.umich.edu/RAR.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

许多蛋白质功能数据库都是基于自动化或半自动化的注释构建的,可能包含各种注释错误。纠正这些错误注释对于提高数据库的准确性和可靠性至关重要。

结果

我们提出了一种新方法,通过比较同一 GO 术语在不同分类群中的注释率比率 (RAR) 来检测潜在的不正确 GO 注释,其中 RAR 相对较低的通常对应于不正确的注释。作为说明,我们将该方法应用于最近 UniProt-GOA 发布的两个版本中的 20 个常见研究物种,并在 2018-11-6 版本中确定了 250 个潜在的错误注释,其中只有 25%在 2019-6-3 版本中得到了纠正。重要的是,错误注释中有 56%是“根据祖先的生物学方面推断(IBA)”,这与之前观察到的主要归因于“根据序列或结构相似性推断(ISS)”的错误注释相矛盾,这可能反映了由于功能注释数据库的新发展而导致的错误源转移。结果表明,这是一种简单但有效的错误注释检测方法,可用于大规模的比较蛋白质功能研究。

可用性和实现

https://zhanglab.ccmb.med.umich.edu/RAR。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

3
The GOA database: gene Ontology annotation updates for 2015.基因本体注释数据库(GOA):2015年基因本体注释更新
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. doi: 10.1093/nar/gku1113. Epub 2014 Nov 6.
5
Quality of computationally inferred gene ontology annotations.计算推断的基因本体论注释的质量。
PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.

引用本文的文献

6
The emerging potential of microbiome transplantation on human health interventions.微生物群移植在人类健康干预方面的新兴潜力。
Comput Struct Biotechnol J. 2022 Jan 19;20:615-627. doi: 10.1016/j.csbj.2022.01.009. eCollection 2022.
9
Accurate annotation of protein coding sequences with IDTAXA.使用IDTAXA对蛋白质编码序列进行准确注释。
NAR Genom Bioinform. 2021 Sep 16;3(3):lqab080. doi: 10.1093/nargab/lqab080. eCollection 2021 Sep.

本文引用的文献

2
UniProt: a worldwide hub of protein knowledge.UniProt:蛋白质知识的全球枢纽。
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515. doi: 10.1093/nar/gky1049.
4
The GOA database: gene Ontology annotation updates for 2015.基因本体注释数据库(GOA):2015年基因本体注释更新
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. doi: 10.1093/nar/gku1113. Epub 2014 Nov 6.
5
Curation accuracy of model organism databases.模式生物数据库的管理准确性。
Database (Oxford). 2014 Jun 12;2014. doi: 10.1093/database/bau058. Print 2014.
7
InterProScan 5: genome-scale protein function classification.InterProScan 5:基因组规模的蛋白质功能分类。
Bioinformatics. 2014 May 1;30(9):1236-40. doi: 10.1093/bioinformatics/btu031. Epub 2014 Jan 21.
9
Quality of computationally inferred gene ontology annotations.计算推断的基因本体论注释的质量。
PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验