Suppr超能文献

将 RNA-seq 数据纳入斑马鱼 Ensembl 基因构建

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

机构信息

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.

出版信息

Genome Res. 2012 Oct;22(10):2067-78. doi: 10.1101/gr.137901.112. Epub 2012 Jul 12.

Abstract

Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.

摘要

Ensembl 基因注释为参考序列对齐的转录本提供了全面的目录。它依赖于公开的物种特异性和同源转录本及其推断的蛋白质序列。通过增加物种特异性成分,可以提高基因模型的准确性,而这可以通过 RNA-seq 以具有成本效益的方式实现。在基于 Zv9 参考序列的 Ensembl 版本 62 中,呈现了两种斑马鱼基因注释。首先,从五个组织和七个发育阶段的 RNA-seq 数据组装了 25748 个基因模型。开发了 3'端捕获和测序协议来预测转录本的 3'端,随后对原始模型中的 46.1%进行了细化。其次,通过一个标准的 Ensembl genebuild,将 RNA-seq-only genebuild 中经过精心过滤的元素整合,然后与手动整理的 VEGA 数据库合并,生成了一个由 51569 个转录本代表的 26152 个基因的全面注释。RNA-seq-only 和 Ensembl/VEGA genebuilds 为最终的 genebuild 提供了不同的元素。RNA-seq genebuild 用于调整同源定义模型的内含子/外显子边界,确认其表达,并改善 3'非翻译区。重要的是,Ensembl genebuild 中的推断蛋白质比对为 RNA-seq 模型的模型连续性提供了证据。通过整合 RNA-seq 数据,增强了斑马鱼基因注释,该管道将用于其他生物体。具有较少物种特异性 cDNA 数据的生物体通常将受益最大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8847/3460200/b63e2ce1ca7b/2067fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验