Hoang Nam V, Furtado Agnelo, Perlo Virginie, Botha Frederik C, Henry Robert J
College of Agriculture and Forestry, Hue University, Hue, Vietnam.
Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, Australia.
Front Genet. 2019 Jul 23;10:654. doi: 10.3389/fgene.2019.00654. eCollection 2019.
Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed, and many new generally shorter transcripts were detected by normalization. For the same input cDNA and data yield, the normalized library recovered more total transcript isoforms and number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ∼1.25 kb and more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ∼52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk, and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ∼80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.
在采用下一代测序技术进行转录组分析时,cDNA归一化被广泛用于提高稀有转录本的覆盖率。最近,长读长技术已成为一种强大的转录组测序和构建工具,特别是对于包含高度相似转录本和转录本剪接异构体的复杂基因组。在这里,我们通过对两种不同的cDNA文库制备(有和没有归一化步骤)进行PacBio异构体测序(Iso-Seq),分析了高度多倍体植物基因组甘蔗的转录组。结果表明,虽然两个文库包含许多相同的转录本,但许多较长的转录本被去除,并且通过归一化检测到许多新的通常较短的转录本。对于相同的输入cDNA和数据产量,与未归一化的文库相比,归一化的文库回收了更多的总转录本异构体以及预测的基因家族和直系同源组数量,从而使甘蔗转录组具有更高的代表性。另一方面,未归一化的文库包含更宽的转录本长度范围,有更多长度超过约1.25 kb的较长转录本,每个基因家族的转录本异构体更多,每个转录本的基因本体术语更多。在包括叶、茎和根在内的三种测试组织类型中,归一化文库中约52%的独特转录本的表达水平低于未归一化文库中的独特转录本。预测的5348个长非编码转录本中约83%来自归一化文库,其中约80%来自低表达部分。独特转录本的功能注释表明,每个文库富集了不同的功能转录本部分。这证明了在本研究使用的测序深度下,这两种方法在获得复杂基因组的完整转录组方面具有互补性。