Bioinformatics Center, Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China.
BMC Bioinformatics. 2011 Dec 30;12:493. doi: 10.1186/1471-2105-12-493.
With the rapid development of the next generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. Because of repetitive regions of genomes and some other factors, assembly of very short reads is still a challenging issue.
A novel strategy for improving genome assembly from very short reads is proposed. It can increase accuracies of assemblies by integrating de novo contigs, and produce comparative contigs by allowing multiple references without limiting to genomes of closely related strains. Comparative contigs are used to scaffold de novo contigs. Using simulated and real datasets, it is shown that our strategy can effectively improve qualities of assemblies of isolated microbial genomes and metagenomes.
With more and more reference genomes available, our strategy will be useful to improve qualities of genome assemblies from very short reads. Some scripts are provided to make our strategy applicable at http://code.google.com/p/cd-hybrid/.
随着下一代测序(NGS)技术的飞速发展,产生了大量的基因组测序数据。由于基因组的重复区域和其他一些因素,非常短的读取序列的组装仍然是一个具有挑战性的问题。
提出了一种改进非常短读序列组装的新策略。通过整合从头组装的序列,它可以提高组装的准确性,并通过允许多个参考序列而不是限制在密切相关的菌株的基因组上,产生比较组装的序列。比较组装的序列用于支架从头组装的序列。使用模拟和真实数据集,结果表明,我们的策略可以有效地提高微生物基因组和宏基因组的组装质量。
随着越来越多的参考基因组的出现,我们的策略将有助于提高非常短读序列组装的基因组质量。一些脚本可以在 http://code.google.com/p/cd-hybrid/ 上使用。