Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.
Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266.
The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes.
In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.
http://cab.spbu.ru/software/quast-lg.
Supplementary data are available at Bioinformatics online.
高通量测序技术在 21 世纪初的出现彻底改变了基因组学。下一次革命伴随着长读测序时代的到来。这些技术进步以及新颖的计算方法成为了迈向能够组装近乎完整的哺乳动物大小基因组的自动流水线的下一步。
在本文中,我们展示了最先进的基因组组装软件在使用不同技术测序的六个真核数据集上的性能。为了评估结果,我们开发了 QUAST-LG 工具,该工具可将大型基因组从头组装与参考序列进行比较,并计算相关的质量指标。由于基因组通常由于复杂的重复模式和低覆盖区域而无法完全重建,因此我们引入了给定基因组和读取集的上限组装的概念,并计算了组装正确性和完整性的理论极限。使用 QUAST-LG,我们展示了组装与理论最优值的接近程度,以及该最优值与完成的参考序列的差距。
http://cab.spbu.ru/software/quast-lg。
补充数据可在 Bioinformatics 在线获得。