FRAMA：从RNA测序数据到注释的mRNA组装体

FRAMA: from RNA-seq data to annotated mRNA assemblies.

作者信息

Bens Martin, Sahm Arne, Groth Marco, Jahn Niels, Morhart Michaela, Holtze Susanne, Hildebrandt Thomas B, Platzer Matthias, Szafranski Karol

机构信息

Leibniz Institute on Ageing - Fritz Lipmann Institute, Beutenbergstr. 11, 07745, Jena, Germany.

Leibniz Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, 10315, Berlin, Germany.

出版信息

BMC Genomics. 2016 Jan 14;17:54. doi: 10.1186/s12864-015-2349-8.

DOI:10.1186/s12864-015-2349-8

PMID:26763976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4712544/

Abstract

BACKGROUND

Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification.

RESULTS

We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations. Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA's gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches.

CONCLUSION

FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA .

摘要

背景

RNA第二代测序技术的进步使得对转录组进行近乎完整的表征变得经济可行。然而，由于真核生物转录组具有高度相似的旁系同源物和多种可变剪接变体，通过从头RNA测序组装来重建全长mRNA仍然很困难。在此，我们展示了FRAMA，这是一种用于从头mRNA组装的与基因组无关的注释工具，可解决多个组装后任务，如减少重叠群冗余、直系同源物分配、错误组装转录本的校正、片段化转录本的支架构建以及编码序列识别。

结果

我们应用FRAMA来组装和注释裸鼹鼠的转录组，并借助公开可用的裸鼹鼠基因注释来评估所获得的转录本汇编的质量。基于从头转录组组装（Trinity），FRAMA注释了21,984个裸鼹鼠mRNA（12,100个全长CDS），对应16,887个基因。3488个基因的支架构建使序列信息中位数增加了1.27倍。总体而言，FRAMA检测并校正了4774个错误组装的基因，这些错误主要是由基因融合引起的。与裸鼹鼠转录本的三种不同来源进行比较表明，FRAMA的基因模型比任何其他转录本集都更能得到RNA测序数据的支持。此外，我们的结果证明了FRAMA与基于基因组的最新转录本重建方法相比具有竞争力。

结论

FRAMA实现了真核生物低冗余转录本目录的从头构建，包括转录本的扩展和优化。因此，FRAMA提供的结果为基因表达研究或比较转录组学等全面的下游分析奠定了基础。FRAMA可在https://github.com/gengit/FRAMA获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11f1/4712544/d7a9e5cb217b/12864_2015_2349_Fig1_HTML.jpg

相似文献

FRAMA: from RNA-seq data to annotated mRNA assemblies.

BMC Genomics. 2016 Jan 14;17:54. doi: 10.1186/s12864-015-2349-8.

Transcriptome assembly, gene annotation and tissue gene expression atlas of the rainbow trout.

PLoS One. 2015 Mar 20;10(3):e0121778. doi: 10.1371/journal.pone.0121778. eCollection 2015.

Challenges and advances for transcriptome assembly in non-model species.

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

A high-quality annotated transcriptome of swine peripheral blood.

BMC Genomics. 2017 Jun 24;18(1):479. doi: 10.1186/s12864-017-3863-7.

Improved annotation with de novo transcriptome assembly in four social amoeba species.

BMC Genomics. 2017 Jan 31;18(1):120. doi: 10.1186/s12864-017-3505-0.

Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote.

BMC Genomics. 2012 Nov 5;13:595. doi: 10.1186/1471-2164-13-595.

De novo Transcriptome Assemblies of Rana (Lithobates) catesbeiana and Xenopus laevis Tadpole Livers for Comparative Genomics without Reference Genomes.

PLoS One. 2015 Jun 29;10(6):e0130720. doi: 10.1371/journal.pone.0130720. eCollection 2015.

Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments.

Mol Ecol. 2013 Feb;22(3):620-34. doi: 10.1111/mec.12014. Epub 2012 Sep 24.

The central nervous system transcriptome of the weakly electric brown ghost knifefish (Apteronotus leptorhynchus): de novo assembly, annotation, and proteomics validation.

BMC Genomics. 2015 Mar 11;16(1):166. doi: 10.1186/s12864-015-1354-2.

Characterization of 954 bovine full-CDS cDNA sequences.

BMC Genomics. 2005 Nov 23;6:166. doi: 10.1186/1471-2164-6-166.

引用本文的文献

Unveiling of brain transcriptome of masked palm civet (Paguma larvata) with chronic infection of Toxoplasma gondii.

Parasit Vectors. 2022 Jul 24;15(1):263. doi: 10.1186/s13071-022-05378-5.

Characterization of naked mole-rat hematopoiesis reveals unique stem and progenitor cell patterns and neotenic traits.

EMBO J. 2022 Aug 1;41(15):e109694. doi: 10.15252/embj.2021109694. Epub 2022 Jun 13.

Ecological Specialization and Evolutionary Reticulation in Extant Hyaenidae.

Mol Biol Evol. 2021 Aug 23;38(9):3884-3897. doi: 10.1093/molbev/msab055.

Alternative Animal Models of Aging Research.

Front Mol Biosci. 2021 May 17;8:660959. doi: 10.3389/fmolb.2021.660959. eCollection 2021.

Abundance and size of hyaluronan in naked mole-rat tissues and plasma.

Sci Rep. 2021 Apr 12;11(1):7951. doi: 10.1038/s41598-021-86967-9.

A common phytoene synthase mutation underlies white petal varieties of the California poppy.

Sci Rep. 2019 Aug 12;9(1):11615. doi: 10.1038/s41598-019-48122-3.

Analysis of the coding sequences of clownfish reveals molecular convergence in the evolution of lifespan.

BMC Evol Biol. 2019 Apr 11;19(1):89. doi: 10.1186/s12862-019-1409-0.

Higher gene expression stability during aging in long-lived giant mole-rats than in short-lived rats.

Aging (Albany NY). 2018 Dec 16;10(12):3938-3956. doi: 10.18632/aging.101683.

CAARS: comparative assembly and annotation of RNA-Seq data.

Bioinformatics. 2019 Jul 1;35(13):2199-2207. doi: 10.1093/bioinformatics/bty903.

Naked mole-rat transcriptome signatures of socially suppressed sexual maturation and links of reproduction to aging.

BMC Biol. 2018 Aug 2;16(1):77. doi: 10.1186/s12915-018-0546-z.

本文引用的文献

Expanding frontiers in plant transcriptomics in aid of functional genomics and molecular breeding.

Biotechnol J. 2014 Dec;9(12):1480-92. doi: 10.1002/biot.201400063. Epub 2014 Oct 28.

Adaptations to a subterranean environment and longevity revealed by the analysis of mole rat genomes.

Cell Rep. 2014 Sep 11;8(5):1354-64. doi: 10.1016/j.celrep.2014.07.030. Epub 2014 Aug 28.

The Naked Mole Rat Genome Resource: facilitating analyses of cancer and longevity-related adaptations.

Bioinformatics. 2014 Dec 15;30(24):3558-60. doi: 10.1093/bioinformatics/btu579. Epub 2014 Aug 28.

Genetic signatures for enhanced olfaction in the African mole-rats.

PLoS One. 2014 Apr 3;9(4):e93336. doi: 10.1371/journal.pone.0093336. eCollection 2014.

Dynamic recruitment of amino acid transporters to the insect/symbiont interface.

Mol Ecol. 2014 Mar;23(6):1608-1623. doi: 10.1111/mec.12627. Epub 2014 Feb 16.

Database resources of the National Center for Biotechnology Information.

Nucleic Acids Res. 2014 Jan;42(Database issue):D7-17. doi: 10.1093/nar/gkt1146. Epub 2013 Nov 19.

Assessment of transcript reconstruction methods for RNA-seq.

Nat Methods. 2013 Dec;10(12):1177-84. doi: 10.1038/nmeth.2714. Epub 2013 Nov 3.

Systematic evaluation of spliced alignment programs for RNA-seq data.

Nat Methods. 2013 Dec;10(12):1185-91. doi: 10.1038/nmeth.2722. Epub 2013 Nov 3.

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

Genome Biol. 2013 Jul 1;14(7):R70. doi: 10.1186/gb-2013-14-7-r70.

Separating homeologs by phasing in the tetraploid wheat transcriptome.

Genome Biol. 2013 Jun 25;14(6):R66. doi: 10.1186/gb-2013-14-6-r66.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FRAMA：从RNA测序数据到注释的mRNA组装体

FRAMA: from RNA-seq data to annotated mRNA assemblies.

作者信息

Bens Martin, Sahm Arne, Groth Marco, Jahn Niels, Morhart Michaela, Holtze Susanne, Hildebrandt Thomas B, Platzer Matthias, Szafranski Karol

机构信息

Leibniz Institute on Ageing - Fritz Lipmann Institute, Beutenbergstr. 11, 07745, Jena, Germany.

Leibniz Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, 10315, Berlin, Germany.