Bens Martin, Sahm Arne, Groth Marco, Jahn Niels, Morhart Michaela, Holtze Susanne, Hildebrandt Thomas B, Platzer Matthias, Szafranski Karol
Leibniz Institute on Ageing - Fritz Lipmann Institute, Beutenbergstr. 11, 07745, Jena, Germany.
Leibniz Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, 10315, Berlin, Germany.
BMC Genomics. 2016 Jan 14;17:54. doi: 10.1186/s12864-015-2349-8.
Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification.
We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations. Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA's gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches.
FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA .
RNA第二代测序技术的进步使得对转录组进行近乎完整的表征变得经济可行。然而,由于真核生物转录组具有高度相似的旁系同源物和多种可变剪接变体,通过从头RNA测序组装来重建全长mRNA仍然很困难。在此,我们展示了FRAMA,这是一种用于从头mRNA组装的与基因组无关的注释工具,可解决多个组装后任务,如减少重叠群冗余、直系同源物分配、错误组装转录本的校正、片段化转录本的支架构建以及编码序列识别。
我们应用FRAMA来组装和注释裸鼹鼠的转录组,并借助公开可用的裸鼹鼠基因注释来评估所获得的转录本汇编的质量。基于从头转录组组装(Trinity),FRAMA注释了21,984个裸鼹鼠mRNA(12,100个全长CDS),对应16,887个基因。3488个基因的支架构建使序列信息中位数增加了1.27倍。总体而言,FRAMA检测并校正了4774个错误组装的基因,这些错误主要是由基因融合引起的。与裸鼹鼠转录本的三种不同来源进行比较表明,FRAMA的基因模型比任何其他转录本集都更能得到RNA测序数据的支持。此外,我们的结果证明了FRAMA与基于基因组的最新转录本重建方法相比具有竞争力。
FRAMA实现了真核生物低冗余转录本目录的从头构建,包括转录本的扩展和优化。因此,FRAMA提供的结果为基因表达研究或比较转录组学等全面的下游分析奠定了基础。FRAMA可在https://github.com/gengit/FRAMA获取。