Simpson Jared T, Wong Kim, Jackman Shaun D, Schein Jacqueline E, Jones Steven J M, Birol Inanç
Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada.
Genome Res. 2009 Jun;19(6):1117-23. doi: 10.1101/gr.089532.108. Epub 2009 Feb 27.
Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs > or =100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.
大规模平行脱氧核糖核酸(DNA)测序仪器的广泛应用推动了从头短读组装算法的近期发展。现有工具的一个常见缺点是它们无法有效地组装大规模测序项目产生的大量数据,例如对个体人类基因组进行测序以编目自然遗传变异。为了解决这一限制,我们开发了ABySS(短序列组装),一种并行化的序列组装器。作为我们软件能力的一个展示,我们组装了Illumina公司公开发布的一名非洲男性基因组的35亿对末端读段。创建了大约276万个长度大于或等于100碱基对(bp)的重叠群,N50大小为1499 bp,占人类参考基因组的68%。对这些重叠群的分析鉴定出人类参考组装中不存在的多态性和新序列,这些序列通过与其他人类组装和其他灵长类基因组比对得到验证。