用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致（OLC）方法的最佳性能。

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

作者信息

Cherukuri Yesesri, Janga Sarath Chandra

机构信息

Department of Bio Health Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IA, 46202, USA.

Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, IA, 46202, USA.

出版信息

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

DOI:10.1186/s12864-016-2895-8

PMID:27556636

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5001211/

Abstract

BACKGROUND

Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage.

RESULTS

In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset.

CONCLUSION

OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data.

摘要

背景

在过去十年中，改进的DNA测序方法改变了基因组学领域。这得益于廉价的短读长测序技术的发展，目前已产生了三代测序平台。最近，基于MinION(®)测序仪开发了新一代的基于纳米孔的单分子测序技术，该技术便携、廉价且快速。它能够生成长度超过100 kb的读段。尽管它有许多特定优势，但MinION读段的两个主要局限性是错误率高以及需要开发下游流程。纠错算法已经出现，而流程开发仍处于起步阶段。

结果

在本研究中，我们对可用的组装算法进行了基准测试，以找到一个能够有效组装纳米孔测序读段的合适框架。为了解决这个问题，我们分别采用了可用于大肠杆菌和酵母基因组的基因组规模的纳米孔测序数据集。为了全面评估多个算法框架，我们纳入了基于de Bruijn图的组装器（Velvet和ABySS）、重叠布局一致（OLC）（Celera）和贪婪扩展（SSAKE）方法的组装器。我们分析了组装的质量、准确性以及我们基准测试中每个组装器的计算性能。我们的分析表明，基于OLC的算法Celera能够生成高质量的组装结果，其N50和平均重叠群值比其他工具高十倍，重叠群总数是其他工具的五分之一。还发现Celera在大肠杆菌数据集中的平均基因组覆盖率为12%，在酵母数据集中为70%，且运行时间相对较短。相比之下，基于de Bruijn图的组装器Velvet和ABySS在内存分配无限制时能在更短时间内生成中等质量的组装结果，而基于贪婪扩展的算法SSAKE生成的组装质量非常差，但在酵母数据集上的基因组覆盖率为90%。

结论

OLC可被视为开发基于纳米孔数据的组装工具的有利算法框架，其次是基于de Bruijn的算法，因为它们在生成组装结果时消耗的运行时间相对较少或与基于OLC的算法相似，而与为任务分配的内存无关。然而，为了生成具有合理质量的组装结果，必须对现有的de Bruijn实现进行一些改进。我们的研究结果应有助于推动用于处理纳米孔序列数据的新型组装器的开发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b80d/5001211/b565fb52cbc8/12864_2016_2895_Fig1_HTML.jpg

相似文献

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

Assembly of long error-prone reads using de Bruijn graphs.

Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.

Clover: a clustering-oriented de novo assembler for Illumina sequences.

BMC Bioinformatics. 2020 Nov 17;21(1):528. doi: 10.1186/s12859-020-03788-9.

FastEtch: A Fast Sketch-Based Assembler for Genomes.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.

de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer.

Gigascience. 2017 Feb 1;6(2):1-13. doi: 10.1093/gigascience/giw018.

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.

Genome assembly using Nanopore-guided long and error-free DNA reads.

BMC Genomics. 2015 Apr 20;16(1):327. doi: 10.1186/s12864-015-1519-z.

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads.

Bioinformatics. 2016 Sep 1;32(17):2582-9. doi: 10.1093/bioinformatics/btw237. Epub 2016 May 9.

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.

Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.

BASE: a practical de novo assembler for large genomes using long NGS reads.

BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):499. doi: 10.1186/s12864-016-2829-5.

引用本文的文献

Haplotype-Resolved Assembly in Polyploid Plants: Methods, Challenges, and Implications for Evolutionary and Breeding Research.

Genes (Basel). 2025 May 27;16(6):636. doi: 10.3390/genes16060636.

Three Rounds of Read Correction Significantly Improve Eukaryotic Protein Detection in ONT Reads.

Microorganisms. 2024 Jan 24;12(2):247. doi: 10.3390/microorganisms12020247.

The impact of applying various de novo assembly and correction tools on the identification of genome characterization, drug resistance, and virulence factors of clinical isolates using ONT sequencing.

BMC Biotechnol. 2023 Jul 31;23(1):26. doi: 10.1186/s12896-023-00797-3.

The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes.

Genome Biol Evol. 2023 Jul 3;15(7). doi: 10.1093/gbe/evad121.

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.

PLoS Comput Biol. 2021 Jul 20;17(7):e1009244. doi: 10.1371/journal.pcbi.1009244. eCollection 2021 Jul.

Characterizing microsatellite polymorphisms using assembly-based and mapping-based tools.

Turk J Biol. 2019 Aug 5;43(4):264-273. doi: 10.3906/biy-1903-16. eCollection 2019.

Real-time detection of BRAF V600E mutation from archival hairy cell leukemia FFPE tissue by nanopore sequencing.

Mol Biol Rep. 2018 Feb;45(1):1-7. doi: 10.1007/s11033-017-4133-0. Epub 2017 Dec 13.

Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.

Brief Bioinform. 2019 May 21;20(3):866-876. doi: 10.1093/bib/bbx147.

Intelligent biology and medicine in 2015: advancing interdisciplinary education, collaboration, and data science.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):524. doi: 10.1186/s12864-016-2893-x.

本文引用的文献

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.

Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.

Genome assembly using Nanopore-guided long and error-free DNA reads.

BMC Genomics. 2015 Apr 20;16(1):327. doi: 10.1186/s12864-015-1519-z.

MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.

Nat Biotechnol. 2015 Mar;33(3):296-300. doi: 10.1038/nbt.3103. Epub 2014 Dec 8.

A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer.

Gigascience. 2014 Oct 20;3:22. doi: 10.1186/2047-217X-3-22. eCollection 2014.

Resolving the complexity of the human genome using single-molecule sequencing.

Nature. 2015 Jan 29;517(7536):608-11. doi: 10.1038/nature13907. Epub 2014 Nov 10.

Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.

PLoS One. 2014 Sep 4;9(9):e106689. doi: 10.1371/journal.pone.0106689. eCollection 2014.

A first look at the Oxford Nanopore MinION sequencer.

Mol Ecol Resour. 2014 Nov;14(6):1097-102. doi: 10.1111/1755-0998.12324. Epub 2014 Sep 24.

Whole-genome haplotyping using long reads and statistical methods.

Nat Biotechnol. 2014 Mar;32(3):261-266. doi: 10.1038/nbt.2833. Epub 2014 Feb 23.

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.

From next-generation sequencing to nanopore sequencing technology: paving the way to personalized genomic medicine.

Expert Rev Med Devices. 2013 Jan;10(1):1-6. doi: 10.1586/erd.12.63.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致（OLC）方法的最佳性能。

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献