Ontario Institute for Cancer Research, Toronto, Canada.
Bioinformatics. 2014 May 1;30(9):1228-35. doi: 10.1093/bioinformatics/btu023. Epub 2014 Jan 17.
The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to assisting the user with the assembly.
This article addresses the practical aspects of de novo assembly by introducing new ways to perform quality assessment on a collection of sequence reads. The software implementation calculates per-base error rates, paired-end fragment-size distributions and coverage metrics in the absence of a reference genome. Additionally, the software will estimate characteristics of the sequenced genome, such as repeat content and heterozygosity that are key determinants of assembly difficulty.
在当前可用的 DNA 测序技术条件下,从头组装大型复杂基因组是一项重大挑战。尽管有许多从头组装软件包,但相对较少关注帮助用户进行组装。
本文通过介绍在没有参考基因组的情况下对序列读取集执行质量评估的新方法来解决从头组装的实际问题。该软件实现了在没有参考基因组的情况下计算每个碱基的错误率、配对末端片段大小分布和覆盖度指标。此外,该软件将估计测序基因组的特征,如重复内容和杂合度,这些是组装难度的关键决定因素。