Department of Chemistry and Biochemistry, 419 Centennial Hall, Texas State University, 601 University Drive, San Marcos, TX 78666, USA.
Comp Biochem Physiol C Toxicol Pharmacol. 2012 Jan;155(1):95-101. doi: 10.1016/j.cbpc.2011.05.012. Epub 2011 Jun 1.
For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set.
对于许多研究人员来说,下一代测序数据是回答以前无法解决的一类问题的关键。在实现这些目标的重要且具有挑战性的步骤之一是将大量短测序读段准确地组装成完整的核酸序列。对于使用非模型或野生系统的研究小组,由于缺乏预先存在的 EST 或基因组参考文库,短读段组装可能会带来重大挑战。虽然许多出版物都描述了测序和组装的整个过程,但很少有出版物涉及到最佳组装所需的读取数量和读取质量评分的类型。本项目的目标是使用实际数据来探索读取数量和短读取质量评分对生成的从头组装的影响。我们使用各种大小和质量的短读取的几个样本以自动化的方式生成了许多组装。我们观察读取长度、读取质量和读取数量的属性如何影响生成的组装,并根据我们的实际数据集提供一些一般建议。