Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA.
Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK, 74078, USA.
BMC Genomics. 2017 Oct 17;18(1):796. doi: 10.1186/s12864-017-4147-y.
Manduca sexta is a large lepidopteran insect widely used as a model to study biochemistry of insect physiological processes. As a part of its genome project, over 50 cDNA libraries have been analyzed to profile gene expression in different tissues and life stages. While the RNA-seq data were used to study genes related to cuticle structure, chitin metabolism and immunity, a vast amount of the information has not yet been mined for understanding the basic molecular biology of this model insect. In fact, the basic features of these data, such as composition of the RNA-seq reads and lists of library-correlated genes, are unclear. From an extended view of all insects, clear-cut tempospatial expression data are rarely seen in the largest group of animals including Drosophila and mosquitoes, mainly due to their small sizes.
We obtained the transcriptome data, analyzed the raw reads in relation to the assembled genome, and generated heatmaps for clustered genes. Library characteristics (tissues, stages), number of mapped bases, and sequencing methods affected the observed percentages of genome transcription. While up to 40% of the reads were not mapped to the genome in the initial Cufflinks gene modeling, we identified the causes for the mapping failure and reduced the number of non-mappable reads to <8%. Similarities between libraries, measured based on library-correlated genes, clearly identified differences among tissues or life stages. We calculated gene expression levels, analyzed the most abundantly expressed genes in the libraries. Furthermore, we analyzed tissue-specific gene expression and identified 18 groups of genes with distinct expression patterns.
We performed a thorough analysis of the 67 RNA-seq datasets to characterize new genomic features of M. sexta. Integrated knowledge of gene functions and expression features will facilitate future functional studies in this biochemical model insect.
曼陀罗蛾是一种大型鳞翅目昆虫,被广泛用作研究昆虫生理过程生化的模型。作为其基因组计划的一部分,已经分析了超过 50 个 cDNA 文库,以分析不同组织和生命阶段的基因表达谱。虽然 RNA-seq 数据被用于研究与表皮结构、几丁质代谢和免疫相关的基因,但大量信息尚未被挖掘出来,以了解这种模式昆虫的基本分子生物学。事实上,这些数据的基本特征,如 RNA-seq 读长的组成和文库相关基因的列表,尚不清楚。从所有昆虫的扩展视角来看,包括果蝇和蚊子在内的最大动物群体中很少有明显的时空表达数据,这主要是由于它们的体型较小。
我们获得了转录组数据,分析了与组装基因组相关的原始读长,并生成了聚类基因的热图。文库特征(组织、阶段)、映射碱基数量和测序方法影响了观察到的基因组转录百分比。虽然在初始 Cufflinks 基因建模中,多达 40%的读长无法映射到基因组,但我们确定了映射失败的原因,并将不可映射的读长数量减少到<8%。基于文库相关基因的相似性,清楚地识别出组织或生命阶段之间的差异。我们计算了基因表达水平,分析了文库中表达最丰富的基因。此外,我们还分析了组织特异性基因表达,并鉴定了 18 组具有不同表达模式的基因。
我们对 67 个 RNA-seq 数据集进行了全面分析,以描述曼陀罗蛾的新基因组特征。基因功能和表达特征的综合知识将有助于未来在这种生化模型昆虫中进行功能研究。