Suppr超能文献

GENCODE:ENCODE 项目的人类参考基因组注释。

GENCODE: the reference human genome annotation for The ENCODE Project.

机构信息

Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

出版信息

Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.

Abstract

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

摘要

GENCODE 联盟旨在通过计算分析、手动注释和实验验证的组合,识别人类基因组中的所有基因特征。自该注释数据集的第一个公共版本发布以来,几乎没有新的蛋白质编码基因被添加,但注释的选择性剪接转录本数量稳步增加。GENCODE 7 版本包含 20687 个蛋白质编码和 9640 个长非编码 RNA 基因座,有 33977 个编码转录本不在 UCSC 基因和 RefSeq 中。它还提供了最全面的长非编码 RNA (lncRNA) 基因座注释,主要转录本形式由两个外显子组成。我们检查了转录本注释的完整性,发现 35%的转录起始位点得到了 CAGE 簇的支持,62%的蛋白质编码基因具有注释的 polyA 位点。超过三分之一的 GENCODE 蛋白质编码基因得到了源自 PeptideAtlas 中提交的质谱光谱的肽命中的支持。来自 Illumina Body Map 2.0 RNA-seq 数据的新模型确定了 3689 个目前不在 GENCODE 中的新基因座,其中 3127 个由两个外显子模型组成,表明它们可能是未注释的长非编码基因座。GENCODE 7 可从 gencodegenes.org 以及 Ensembl 和 UCSC Genome Browser 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c031/3431492/0cfb1bda80f3/1760fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验