Beijing Genomics Institute at Shenzhen, Shenzhen 518083, China.
Genome Res. 2010 Feb;20(2):265-72. doi: 10.1101/gr.097261.109. Epub 2009 Dec 17.
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
下一代大规模并行 DNA 测序技术提供了超高的通量,同时大幅降低了单位数据成本;然而,这些数据是非常短的读长序列,使得从头组装极具挑战性。在这里,我们描述了一种从短读序列中从头组装大型基因组的新方法。我们成功地组装了亚洲和非洲人类基因组序列,获得了分别为 7.4 和 5.9 千碱基 (kb) 的 N50 片段大小和 446.3 和 61.9 kb 的支架大小。这种从头开始的短读序列组装方法的发展为构建参考序列和以具有成本效益的方式对未探索的基因组进行精确分析创造了新的机会。