• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质序列数据库中注释错误的渗流建模。

Modeling the percolation of annotation errors in a database of protein sequences.

作者信息

Gilks Walter R, Audit Benjamin, De Angelis Daniela, Tsoka Sophia, Ouzounis Christos A

机构信息

Medical Research Council Biostatistics Unit, Cambridge, UK.

出版信息

Bioinformatics. 2002 Dec;18(12):1641-9. doi: 10.1093/bioinformatics/18.12.1641.

DOI:10.1093/bioinformatics/18.12.1641
PMID:12490449
Abstract

Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.

摘要

公共序列数据库包含有关蛋白质序列、结构和功能的信息。基因组测序项目导致蛋白质序列信息迅速增加,但关于蛋白质功能的可靠的、经过实验验证的信息却远远滞后。为了弥补这一不足,蛋白质数据库中的功能注释通常是通过与同源的、已注释的蛋白质的序列相似性来推断的,这就伴随着出错的可能性。现在,这些同源蛋白质中的功能注释本身可能也是通过与其他蛋白质的序列相似性获得的,而且通常无法确定任何给定蛋白质的功能注释是如何获得的。因此,错误注释链的可能性就出现了,我们将这个过程称为“错误渗透”。通过一些简单的假设,我们为这些错误注释链开发了一个动态概率模型。通过探索该模型对注释质量的影响,很明显这种迭代方法会导致数据库质量的系统性下降。

相似文献

1
Modeling the percolation of annotation errors in a database of protein sequences.蛋白质序列数据库中注释错误的渗流建模。
Bioinformatics. 2002 Dec;18(12):1641-9. doi: 10.1093/bioinformatics/18.12.1641.
2
About the use of protein models.关于蛋白质模型的使用。
Bioinformatics. 2002 Jul;18(7):934-8. doi: 10.1093/bioinformatics/18.7.934.
3
Mining sequence annotation databanks for association patterns.挖掘序列注释数据库中的关联模式。
Bioinformatics. 2005 Nov 1;21 Suppl 3:iii49-57. doi: 10.1093/bioinformatics/bti1206.
4
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.研究基因本体中语义相似性度量:序列与注释之间的关系。
Bioinformatics. 2003 Jul 1;19(10):1275-83. doi: 10.1093/bioinformatics/btg153.
5
Percolation of annotation errors through hierarchically structured protein sequence databases.注释错误在分层结构的蛋白质序列数据库中的渗透。
Math Biosci. 2005 Feb;193(2):223-34. doi: 10.1016/j.mbs.2004.08.001.
6
Protein family annotation in a multiple alignment viewer.多重比对查看器中的蛋白质家族注释。
Bioinformatics. 2003 Mar 1;19(4):544-5. doi: 10.1093/bioinformatics/btg021.
7
Statistically rigorous automated protein annotation.统计严格的自动化蛋白质注释。
Bioinformatics. 2004 May 1;20(7):1066-73. doi: 10.1093/bioinformatics/bth039. Epub 2004 Feb 5.
8
WILMA-automated annotation of protein sequences.WILMA - 蛋白质序列的自动注释
Bioinformatics. 2004 Jan 1;20(1):127-8. doi: 10.1093/bioinformatics/btg380.
9
STORM towards protein function: systematic tailored ORF-data retrieval and management.面向蛋白质功能的STORM:系统定制的开放阅读框数据检索与管理
Appl Bioinformatics. 2003;2(3):177-9.
10
Automatic annotation of protein function.蛋白质功能的自动注释
Curr Opin Struct Biol. 2005 Jun;15(3):267-74. doi: 10.1016/j.sbi.2005.05.010.

引用本文的文献

1
Artificial intelligence-based prediction of pathogen emergence and evolution in the world of synthetic biology.基于人工智能的预测病原体在合成生物学世界中的出现和进化。
Microb Biotechnol. 2024 Oct;17(10):e70014. doi: 10.1111/1751-7915.70014.
2
Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.背景知识的整合用于自动检测基因本体论注释中的不一致性。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i390-i400. doi: 10.1093/bioinformatics/btae246.
3
The impact of transitive annotation on the training of taxonomic classifiers.
传递注释对分类学分类器训练的影响。
Front Microbiol. 2024 Jan 3;14:1240957. doi: 10.3389/fmicb.2023.1240957. eCollection 2023.
4
Metallo-Beta-Lactamase-like Encoding Genes in Candidate Phyla Radiation: Widespread and Highly Divergent Proteins with Potential Multifunctionality.候选门类辐射中的金属β-内酰胺酶样编码基因:具有潜在多功能性的广泛且高度分化的蛋白质
Microorganisms. 2023 Jul 28;11(8):1933. doi: 10.3390/microorganisms11081933.
5
Propagation, detection and correction of errors using the sequence database network.利用序列数据库网络进行错误的传播、检测和纠正。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac416.
6
Exploring automatic inconsistency detection for literature-based gene ontology annotation.探索基于文献的基因本体论自动标注不一致性检测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i273-i281. doi: 10.1093/bioinformatics/btac230.
7
The curse of the uncultured fungus.未培养真菌的困扰
MycoKeys. 2022 Feb 2;86:177-194. doi: 10.3897/mycokeys.86.76053. eCollection 2022.
8
Automatic consistency assurance for literature-based gene ontology annotation.基于文献的基因本体论自动一致性保证。
BMC Bioinformatics. 2021 Nov 25;22(1):565. doi: 10.1186/s12859-021-04479-9.
9
Experimental and computational investigation of enzyme functional annotations uncovers misannotation in the EC 1.1.3.15 enzyme class.实验和计算研究酶功能注释揭示了 EC 1.1.3.15 酶类中的错误注释。
PLoS Comput Biol. 2021 Sep 23;17(9):e1009446. doi: 10.1371/journal.pcbi.1009446. eCollection 2021 Sep.
10
GP4: an integrated Gram-Positive Protein Prediction Pipeline for subcellular localization mimicking bacterial sorting.GP4:一种用于模拟细菌分拣的细胞内定位的综合革兰氏阳性蛋白预测管道。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa302.