California Institute for Quantitative Biosciences, University of California , Berkeley, CA , USA.
PeerJ. 2013 Jul 23;1:e113. doi: 10.7717/peerj.113. Print 2013.
The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.
在过去几年中,通过转录组和 RNAseq 的应用,功能基因组学的研究,特别是在非模式生物中的研究得到了显著的改善。虽然这些研究具有潜在的巨大威力,但必须先完成一个计算密集型的过程,即从头构建一个参考转录组,才能进行进一步的分析。准确的参考是至关重要的,因为所有下游步骤,包括估计转录本丰度,都严重依赖于准确的参考构建。虽然已经有大量关于组装的研究,但直到最近才详细研究了预组装过程。具体来说,已经报道了几个独立的错误纠正模块,虽然它们已经证明在降低测序读段水平的错误方面是有效的,但错误纠正如何影响组装准确性在很大程度上是未知的。在这里,我们通过使用模拟和经验数据集表明,将错误纠正应用于测序读段对组装准确性有显著的积极影响,并且应该应用于所有数据集。可以在 https://github.com/macmanes/error_correction/tree/master/scripts 上获得允许生成 Reptile 纠错读段的完整命令集,并作为文件 S1 提供。