Suppr超能文献

RefSeq:原核生物基因组注释和管理的最新进展。

RefSeq: an update on prokaryotic genome annotation and curation.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA.

出版信息

Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860. doi: 10.1093/nar/gkx1068.

Abstract

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

摘要

美国国家生物技术信息中心 (NCBI) 的参考序列 (RefSeq) 项目为符合序列质量、完整性和无污染标准的超过 95000 个原核基因组提供注释。基因组由单个原核基因组注释管道 (PGAP) 进行注释,为用户提供尽可能一致和准确的资源。最近的一些显著变化包括开发分层证据方案、新的重点是注释证据来源、添加和注释蛋白质轮廓隐马尔可夫模型 (HMM)、发布更新的管道 (PGAP-4) 以及全面重新注释 RefSeq 原核基因组。全面重新注释了抗生素抗性蛋白,提供了插入序列转座酶和硒蛋白结构注释的改进,经过精心整理的复杂结构架构为数百万个多结构域蛋白赋予了升级后的名称,并且我们引入了一种新的注释规则-BlastRules。持续对支持证据进行整理,并将改进后的名称传播到 RefSeq 蛋白上,以确保基因组的功能注释保持最新。我们的注释现在越来越多地来自 HMM 和其他自然可移植的注释规则集,可供下载并供其他研究人员重复使用。RefSeq 可在 https://www.ncbi.nlm.nih.gov/refseq/ 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0226/5753331/f7ecfa86284e/gkx1068fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验