Center for Bioinformatics and Computational Biology, Institute of Biomedical Sciences, School of Life Science, East China Normal University, Shanghai 200241, China.
Sci China Life Sci. 2011 Dec;54(12):1129-33. doi: 10.1007/s11427-011-4256-9. Epub 2012 Jan 7.
De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences. We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line, respectively. We then determined an efficient way to yield an optimal overall assembly using three different strategies. We first assembled brain and cell line transcriptome using a single k-mer length. Next we tested a range of values of k-mer length and coverage cutoff in assembling. Lastly, we combined the assembled contigs from a range of k values to generate a final assembly. By comparing these assembly results, we found that using only one k-mer value for assembly is not enough to generate good assembly results, but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.
从头转录组组装是 RNA-Seq 数据分析中的一种重要方法,它可以帮助我们在没有参考基因组序列的情况下重建转录组并研究基因表达谱。我们分别使用两个来自人脑和细胞系的 RNA-Seq 数据集进行转录组组装。然后,我们确定了使用三种不同策略生成最佳整体组装的有效方法。我们首先使用单个 k-mer 长度组装脑和细胞系转录组。接下来,我们在组装中测试了 k-mer 长度和覆盖范围的一系列值。最后,我们将来自一系列 k 值的组装连续体组合在一起,生成最终的组装。通过比较这些组装结果,我们发现仅使用一个 k-mer 值进行组装不足以生成良好的组装结果,但是组合来自不同 k-mer 值的连续体可以生成更长的连续体,并大大提高整体组装质量。