将 RNA-seq 数据纳入斑马鱼 Ensembl 基因构建

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

机构信息

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.

出版信息

Genome Res. 2012 Oct;22(10):2067-78. doi: 10.1101/gr.137901.112. Epub 2012 Jul 12.

DOI:10.1101/gr.137901.112

PMID:22798491

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3460200/

Abstract

Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.

摘要

Ensembl 基因注释为参考序列对齐的转录本提供了全面的目录。它依赖于公开的物种特异性和同源转录本及其推断的蛋白质序列。通过增加物种特异性成分，可以提高基因模型的准确性，而这可以通过 RNA-seq 以具有成本效益的方式实现。在基于 Zv9 参考序列的 Ensembl 版本 62 中，呈现了两种斑马鱼基因注释。首先，从五个组织和七个发育阶段的 RNA-seq 数据组装了 25748 个基因模型。开发了 3'端捕获和测序协议来预测转录本的 3'端，随后对原始模型中的 46.1%进行了细化。其次，通过一个标准的 Ensembl genebuild，将 RNA-seq-only genebuild 中经过精心过滤的元素整合，然后与手动整理的 VEGA 数据库合并，生成了一个由 51569 个转录本代表的 26152 个基因的全面注释。RNA-seq-only 和 Ensembl/VEGA genebuilds 为最终的 genebuild 提供了不同的元素。RNA-seq genebuild 用于调整同源定义模型的内含子/外显子边界，确认其表达，并改善 3'非翻译区。重要的是，Ensembl genebuild 中的推断蛋白质比对为 RNA-seq 模型的模型连续性提供了证据。通过整合 RNA-seq 数据，增强了斑马鱼基因注释，该管道将用于其他生物体。具有较少物种特异性 cDNA 数据的生物体通常将受益最大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8847/3460200/b63e2ce1ca7b/2067fig1.jpg

相似文献

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.将 RNA-seq 数据纳入斑马鱼 Ensembl 基因构建

Genome Res. 2012 Oct;22(10):2067-78. doi: 10.1101/gr.137901.112. Epub 2012 Jul 12.

An improved zebrafish transcriptome annotation for sensitive and comprehensive detection of cell type-specific genes.改进的斑马鱼转录组注释，用于敏感和全面检测细胞类型特异性基因。

Elife. 2020 Aug 24;9:e55792. doi: 10.7554/eLife.55792.

GASS: genome structural annotation for Eukaryotes based on species similarity.GASS：基于物种相似性的真核生物基因组结构注释

BMC Genomics. 2015 Mar 4;16(1):150. doi: 10.1186/s12864-015-1353-3.

A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification.在RNA测序读段映射和基因定量的背景下，对Ensembl、RefSeq和UCSC注释进行全面评估。

BMC Genomics. 2015 Feb 18;16(1):97. doi: 10.1186/s12864-015-1308-8.

GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE：ENCODE 项目的人类参考基因组注释。

Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.

AceView: a comprehensive cDNA-supported gene and transcripts annotation.AceView：一个由cDNA支持的全面的基因和转录本注释。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7.

A unified gene catalog for the laboratory mouse reference genome.实验室小鼠参考基因组的统一基因目录。

Mamm Genome. 2015 Aug;26(7-8):295-304. doi: 10.1007/s00335-015-9571-1. Epub 2015 Jun 18.

A high-quality annotated transcriptome of swine peripheral blood.猪外周血的高质量注释转录组。

BMC Genomics. 2017 Jun 24;18(1):479. doi: 10.1186/s12864-017-3863-7.

Assessing the impact of human genome annotation choice on RNA-seq expression estimates.评估人类基因组注释选择对 RNA-seq 表达估计的影响。

BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-14-S11-S8. Epub 2013 Nov 4.

Mining Arabidopsis thaliana RNA-seq data with Integrated Genome Browser reveals stress-induced alternative splicing of the putative splicing regulator SR45a.利用 Integrated Genome Browser 挖掘拟南芥 RNA-seq 数据揭示了应激诱导的假定剪接调控因子 SR45a 的可变剪接。

Am J Bot. 2012 Feb;99(2):219-31. doi: 10.3732/ajb.1100355. Epub 2012 Jan 30.

引用本文的文献

High resolution of full-length RNA sequencing deciphers massive transcriptome complexity during zebrafish embryogenesis.全长RNA测序的高分辨率解析了斑马鱼胚胎发育过程中大量的转录组复杂性。

BMC Biol. 2025 Jun 4;23(1):155. doi: 10.1186/s12915-025-02271-2.

Transposable Elements Drive Regulatory and Functional Innovation of F-box Genes.转座元件驱动F-box基因的调控和功能创新。

Mol Biol Evol. 2025 Apr 30;42(5). doi: 10.1093/molbev/msaf097.

Conserved glucokinase regulation in zebrafish confirms therapeutic utility for pharmacologic modulation in diabetes.斑马鱼中葡萄糖激酶的调节作用较为保守，这一发现为药物调节糖尿病治疗提供了潜在的应用价值。

Commun Biol. 2024 Nov 23;7(1):1557. doi: 10.1038/s42003-024-07264-5.

Discovering microproteins: making the most of ribosome profiling data.发现微小蛋白质：充分利用核糖体分析数据。

RNA Biol. 2023 Jan;20(1):943-954. doi: 10.1080/15476286.2023.2279845. Epub 2023 Nov 27.

A Baseline for Skeletal Investigations in Medaka (): The Effects of Rearing Density on the Postcranial Phenotype.鱼类骨骼研究基准线（）：养殖密度对后生骨表型的影响。

Front Endocrinol (Lausanne). 2022 Jun 30;13:893699. doi: 10.3389/fendo.2022.893699. eCollection 2022.

The rise of genomics in snake venom research: recent advances and future perspectives.基因组学在蛇毒研究中的兴起：最新进展与未来展望。

Gigascience. 2022 Apr 1;11. doi: 10.1093/gigascience/giac024.

LncRNA VEAL2 regulates PRKCB2 to modulate endothelial permeability in diabetic retinopathy.长链非编码 RNA VEAL2 通过调节 PRKCB2 来调节糖尿病视网膜病变中的血管内皮通透性。

EMBO J. 2021 Aug 2;40(15):e107134. doi: 10.15252/embj.2020107134. Epub 2021 Jun 28.

Identifying the Related Genes of Muscle Growth and Exploring the Functions by Compensatory Growth in Mandarin Fish ().翘嘴鲌肌肉生长相关基因鉴定及补偿生长功能探究

Front Physiol. 2020 Sep 25;11:553563. doi: 10.3389/fphys.2020.553563. eCollection 2020.

Shedding new light on early sex determination in zebrafish.揭示斑马鱼早期性别决定的新机制

Arch Toxicol. 2020 Dec;94(12):4143-4158. doi: 10.1007/s00204-020-02915-y. Epub 2020 Sep 25.

Maternal Larp6 controls oocyte development, chorion formation and elevation.母源 Larp6 控制卵母细胞发育、绒毛膜形成和隆起。

Development. 2020 Feb 26;147(4):dev187385. doi: 10.1242/dev.187385.

本文引用的文献

Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer.塔斯马尼亚恶魔的基因组测序和分析及其传染性癌症。

Cell. 2012 Feb 17;148(4):780-91. doi: 10.1016/j.cell.2011.11.065.

Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution.脊椎动物胚胎发育中 lincRNAs 的保守功能，尽管序列进化迅速。

Cell. 2011 Dec 23;147(7):1537-50. doi: 10.1016/j.cell.2011.11.055.

Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis.系统鉴定斑马鱼胚胎发生过程中表达的长非编码 RNA。

Genome Res. 2012 Mar;22(3):577-91. doi: 10.1101/gr.133009.111. Epub 2011 Nov 22.

Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation.酵母和人类的综合多聚腺苷酸化位点图谱揭示了普遍存在的可变多聚腺苷酸化。

Cell. 2010 Dec 10;143(6):1018-29. doi: 10.1016/j.cell.2010.11.020.

Ensembl 2011.Ensembl 2011年版

Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2.

De novo assembly and analysis of RNA-seq data.从头组装和分析 RNA-seq 数据。

Nat Methods. 2010 Nov;7(11):909-12. doi: 10.1038/nmeth.1517. Epub 2010 Oct 10.

Optimization of de novo transcriptome assembly from next-generation sequencing data.从头转录组组装的优化。

Genome Res. 2010 Oct;20(10):1432-40. doi: 10.1101/gr.103846.109. Epub 2010 Aug 6.

The landscape of C. elegans 3'UTRs.秀丽隐杆线虫 3'UTR 景观。

Science. 2010 Jul 23;329(5990):432-5. doi: 10.1126/science.1191244. Epub 2010 Jun 3.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.通过 RNA-Seq 进行转录本组装和定量分析揭示了细胞分化过程中未注释的转录本和异构体转换。

Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs.从头构建小鼠细胞类型特异性转录组揭示了 lincRNAs 的保守多外显子结构。

Nat Biotechnol. 2010 May;28(5):503-10. doi: 10.1038/nbt.1633. Epub 2010 May 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将 RNA-seq 数据纳入斑马鱼 Ensembl 基因构建

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献