National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. doi: 10.1093/nar/gkt1114. Epub 2013 Nov 19.
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.
国家生物技术信息中心(NCBI)参考序列(RefSeq)数据库是一个包含注释基因组、转录本和蛋白质序列记录的集合,这些记录源自公共序列档案中的数据以及计算、注释和协作(http://www.ncbi.nlm.nih.gov/refseq/)。我们在此报告哺乳动物和人类子集的增长、NCBI 的真核生物注释管道的变化以及影响转录本和蛋白质记录的修改。最近对 NCBI 的真核生物基因组注释管道的更改提供了更高的吞吐量,并且将 RNAseq 数据添加到管道中导致注释的转录本和新外显子数量显著增加。最近的注释更改包括报告转录本记录的支持证据、修改外显子特征注释以及添加关于生物感兴趣的基因和序列属性的结构化报告。我们还描述了具有更不同预测蛋白质的选择性剪接转录本的修订蛋白质注释策略,并总结了 RefSeqGene 项目的当前状态。