Suppr超能文献

AceView:一个由cDNA支持的全面的基因和转录本注释。

AceView: a comprehensive cDNA-supported gene and transcripts annotation.

作者信息

Thierry-Mieg Danielle, Thierry-Mieg Jean

机构信息

National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.

出版信息

Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7.

Abstract

BACKGROUND

Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants.

RESULTS

We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome. AceView accuracy is now validated by Gencode.

CONCLUSION

Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%).

摘要

背景

由ENCODE选择用于深入分析的覆盖基因组1%的区域,已由哈瓦那/基因编码(HAVANA/Gencode)团队注释为高质量转录本,从而定义了一个基准。ENCODE基因组注释评估项目(EGASP)竞赛旨在重现基因编码并发现新基因。组织者对蛋白质预测进行了深入评估。我们对mRNA进行了补充分析,包括可变转录本变体。

结果

我们评估了来自加州大学圣克鲁兹分校(UCSC)基因组浏览器的25个基因轨迹。我们区分或合并可变剪接变体,并比较外显子、内含子和核苷酸的基因组坐标。将视为内含子链的完整mRNA模型进行排序以找到最佳匹配对,并进行比较,使得每个mRNA仅使用一次。在mRNA水平上,AceView是目前最接近基因编码的:两种方法的绝大多数转录本,包括可变变体,都是相同的。然而,在蛋白质水平上,由于缺乏实验数据,我们的预测有所不同:基因编码仅对41%的mRNA注释了蛋白质,而AceView几乎对所有mRNA都进行了注释。我们描述了AceView的驱动原则,以及通过进行人工监督的自动注释,我们如何解决组合剪接问题,并将所有GenBank、dbEST和RefSeq汇总成一个全基因组非冗余但全面的cDNA支持的转录组。AceView的准确性现在已得到基因编码的验证。

结论

相对于从所有基于证据的注释构建的共识mRNA目录,基因编码和AceView的灵敏度分别为81%和84%,特异性分别为74%和73%。这种密切一致性验证了对人类转录组更丰富的观点,其转录本数量比UCSC已知基因(灵敏度28%)、RefSeq(灵敏度21%)或Ensembl(灵敏度19%)多三到五倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bdc/1810549/32ef00f2d63f/gb-2006-7-s1-s12-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验