Suppr超能文献

蛋白质序列数据库中注释错误的渗流建模。

Modeling the percolation of annotation errors in a database of protein sequences.

作者信息

Gilks Walter R, Audit Benjamin, De Angelis Daniela, Tsoka Sophia, Ouzounis Christos A

机构信息

Medical Research Council Biostatistics Unit, Cambridge, UK.

出版信息

Bioinformatics. 2002 Dec;18(12):1641-9. doi: 10.1093/bioinformatics/18.12.1641.

Abstract

Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.

摘要

公共序列数据库包含有关蛋白质序列、结构和功能的信息。基因组测序项目导致蛋白质序列信息迅速增加,但关于蛋白质功能的可靠的、经过实验验证的信息却远远滞后。为了弥补这一不足,蛋白质数据库中的功能注释通常是通过与同源的、已注释的蛋白质的序列相似性来推断的,这就伴随着出错的可能性。现在,这些同源蛋白质中的功能注释本身可能也是通过与其他蛋白质的序列相似性获得的,而且通常无法确定任何给定蛋白质的功能注释是如何获得的。因此,错误注释链的可能性就出现了,我们将这个过程称为“错误渗透”。通过一些简单的假设,我们为这些错误注释链开发了一个动态概率模型。通过探索该模型对注释质量的影响,很明显这种迭代方法会导致数据库质量的系统性下降。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验