Suppr超能文献

蛋白质功能的实验注释中的偏差及其对我们理解蛋白质功能空间的影响。

Biases in the experimental annotations of protein function and their effect on our understanding of protein function space.

机构信息

Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA.

出版信息

PLoS Comput Biol. 2013;9(5):e1003063. doi: 10.1371/journal.pcbi.1003063. Epub 2013 May 30.

Abstract

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the "few articles - many proteins" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.

摘要

蛋白质的功能注释目前依赖于注释员的工作,他们需要从科学文献中获取实验结果,并将其应用于蛋白质序列和结构数据。然而,随着高通量实验方法的广泛应用,少数几个实验研究主导了数据库中收集的功能蛋白质注释。在这里,我们研究了“少数文章-大量蛋白质”现象的普遍性。我们检查了 GO 联盟的几个小组提供的蛋白质的实验验证注释,并表明发表的研究中蛋白质的分布呈指数分布,在 UniProt-GOA 汇编中,25%的蛋白质的注释来源只有 0.14%的文章。由于每个主导文章都描述了使用一种只能发现一种或少数几种功能的测定方法,这导致了我们对许多蛋白质功能的了解存在很大的偏差。质谱、显微镜和 RNAi 实验主导了高通量实验。因此,这些实验得出的功能信息主要是蛋白质的亚细胞位置,以及蛋白质在胚胎发育途径中的参与。对于某些生物,不同研究提供的信息有很大的重叠。我们还表明,高通量实验提供的信息不如低通量实验提供的信息具体。考虑到可用的实验技术,由于高通量实验而导致的蛋白质功能注释中的某些偏差是不可避免的。了解这些偏差的存在以及它们的特征和程度,对数据库注释员、功能注释程序的开发者以及任何使用蛋白质功能注释数据来计划实验的人都很重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02b3/3667760/68d285da91e0/pcbi.1003063.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验