Suppr超能文献

UniProtKB 在植物蛋白质组学研究的混乱中。

UniProtKB amid the turmoil of plant proteomics research.

机构信息

Swiss-Prot, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire Geneva, Switzerland.

出版信息

Front Plant Sci. 2012 Dec 6;3:270. doi: 10.3389/fpls.2012.00270. eCollection 2012.

Abstract

The UniProt KnowledgeBase (UniProtKB) provides a single, centralized, authoritative resource for protein sequences and functional information. The majority of its records is based on automatic translation of coding sequences (CDS) provided by submitters at the time of initial deposition to the nucleotide sequence databases (INSDC). This article will give a general overview of the current situation, with some specific illustrations extracted from our annotation of Arabidopsis and rice proteomes. More and more frequently, only the raw sequence of a complete genome is deposited to the nucleotide sequence databases and the gene model predictions and annotations are kept in separate, specialized model organism databases (MODs). In order to be able to provide the complete proteome of model organisms, UniProtKB had to implement pipelines for import of protein sequences from Ensembl and EnsemblGenomes. A single genome can be the target of several unrelated sequencing projects and the final assembly and gene model predictions may diverge quite significantly. In addition, several cultivars of the same species are often sequenced - 1001 Arabidopsis cultivars are currently under way - and the resulting proteomes are far from being identical. Therefore, one challenge for UniProtKB is to store and organize these data in a convenient way and to clearly defined reference proteomes that should be made available to users. Manual annotation is one of the landmarks of the Swiss-Prot section of UniProtKB. Besides adding functional annotation, curators are checking, and often correcting, gene model predictions. For plants, this task is limited to Arabidopsis thaliana and Oryza sativa subsp. japonica. Proteomics data providing experimental evidences confirming the existence of proteins or identifying sequence features such as post-translational modifications are also imported into UniProtKB records and the knowledgebase is cross-referenced to numerous proteomics resource.

摘要

UniProt 知识库 (UniProtKB) 为蛋白质序列和功能信息提供了一个单一、集中、权威的资源。其大多数记录基于提交者在向核苷酸序列数据库 (INSDC) 首次提交时提供的编码序列 (CDS) 的自动翻译。本文将概述当前的情况,并从我们对拟南芥和水稻蛋白质组的注释中提取一些具体示例。越来越多的情况是,只有完整基因组的原始序列被提交到核苷酸序列数据库,而基因模型预测和注释则保留在单独的、专门的模式生物数据库 (MOD) 中。为了能够提供模式生物的完整蛋白质组,UniProtKB 不得不实施从 Ensembl 和 EnsemblGenomes 导入蛋白质序列的管道。一个基因组可能是几个不相关的测序项目的目标,最终的组装和基因模型预测可能会有很大的差异。此外,同一个物种的几个品种通常也会被测序 - 目前正在进行 1001 个拟南芥品种的测序 - 并且由此产生的蛋白质组远非完全相同。因此,UniProtKB 的一个挑战是以方便的方式存储和组织这些数据,并定义明确的参考蛋白质组,供用户使用。手动注释是 UniProtKB 的 Swiss-Prot 部分的一个里程碑。除了添加功能注释外,注释人员还在检查并经常纠正基因模型预测。对于植物,这项任务仅限于拟南芥和水稻亚种。提供实验证据证实蛋白质存在或识别翻译后修饰等序列特征的蛋白质组学数据也被导入 UniProtKB 记录中,并且该知识库与众多蛋白质组学资源交叉引用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3515866/25cba5afa9f3/fpls-03-00270-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验