HGA：一种利用高覆盖度短测序读段进行细菌基因组从头组装的方法。

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.

作者信息

Al-Okaily Anas A

机构信息

Computer Science & Engineering Department, University of Connecticut, Storrs, 06269, CT, USA.

出版信息

BMC Genomics. 2016 Mar 5;17:193. doi: 10.1186/s12864-016-2515-7.

DOI:10.1186/s12864-016-2515-7

PMID:26945881

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4779561/

Abstract

BACKGROUND

Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage.

RESULTS

In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads.

CONCLUSIONS

We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x- ∼200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics.

摘要

背景

当前的高通量测序技术会生成大量相对较短且容易出错的读段，这使得从头组装问题具有挑战性。尽管通过组装具有短插入片段大小和长插入片段大小的多个双末端文库可以获得高质量的组装结果，但生成后者成本很高。最近，GAGE - B研究表明，通过在具有非常高覆盖度的单个短插入片段文库上运行最先进的组装程序，可以获得细菌基因组的非常好的组装质量。

结果

在本文中，我们介绍了一种新颖的分层基因组组装（HGA）方法，该方法通过独立组装读段的不相交子集、合并子集的组装结果，最后将合并的重叠群与原始读段一起重新组装，进一步利用了这种非常高的覆盖度。

结论

我们使用7个GAGE - B细菌数据集，对8种领先的组装程序进行了实证评估，这些数据集由100 bp的Illumina HiSeq和250 bp的Illumina MiSeq读段组成，覆盖度范围为100x - 约200x。结果表明，对于所有评估的数据集以及使用大多数评估的组装程序（用于组装不相交子集），基于N50和校正后的N50指标，HGA会显著提高组装质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7ff/4779561/24ae80e3fadc/12864_2016_2515_Fig1_HTML.jpg

相似文献

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.

BMC Genomics. 2016 Mar 5;17:193. doi: 10.1186/s12864-016-2515-7.

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.

QuorUM: An Error Corrector for Illumina Reads.

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.

PLoS One. 2015 Dec 7;10(12):e0144305. doi: 10.1371/journal.pone.0144305. eCollection 2015.

BASE: a practical de novo assembler for large genomes using long NGS reads.

BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):499. doi: 10.1186/s12864-016-2829-5.

Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing.

BMC Genomics. 2015 Aug 28;16(1):648. doi: 10.1186/s12864-015-1859-8.

Assembling short reads from jumping libraries with large insert sizes.

Bioinformatics. 2015 Oct 15;31(20):3262-8. doi: 10.1093/bioinformatics/btv337. Epub 2015 Jun 3.

Genome assembly using Nanopore-guided long and error-free DNA reads.

BMC Genomics. 2015 Apr 20;16(1):327. doi: 10.1186/s12864-015-1519-z.

引用本文的文献

Genome Characterization and Infectivity Potential of Vibriophage-ϕLV6 with Lytic Activity against Luminescent Vibrios of Shrimp Aquaculture.

Viruses. 2023 Mar 28;15(4):868. doi: 10.3390/v15040868.

Reference-based read clustering improves the genome assembly of microbial strains.

Comput Struct Biotechnol J. 2022 Dec 21;21:444-451. doi: 10.1016/j.csbj.2022.12.032. eCollection 2023.

BOA: A partitioned view of genome assembly.

iScience. 2022 Oct 8;25(11):105273. doi: 10.1016/j.isci.2022.105273. eCollection 2022 Nov 18.

Targeted de novo phasing and long-range assembly by template mutagenesis.

Nucleic Acids Res. 2022 Oct 14;50(18):e103. doi: 10.1093/nar/gkac592.

Comparative Genomics Reveals a Remarkable Biosynthetic Potential of the Phylogenetic Lineage Associated with Rugose-Ornamented Spores.

mSystems. 2021 Aug 31;6(4):e0048921. doi: 10.1128/mSystems.00489-21. Epub 2021 Aug 24.

De novo genome assembly of Bacillus altitudinis 19RS3 and Bacillus altitudinis T5S-T4, two plant growth-promoting bacteria isolated from Ilex paraguariensis St. Hil. (yerba mate).

PLoS One. 2021 Mar 11;16(3):e0248274. doi: 10.1371/journal.pone.0248274. eCollection 2021.

Identification of genetic relationships and subspecies signatures in Xylella fastidiosa.

BMC Genomics. 2019 Mar 25;20(1):239. doi: 10.1186/s12864-019-5565-9.

Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb).

BMC Bioinformatics. 2019 Feb 5;20(Suppl 1):37. doi: 10.1186/s12859-018-2570-y.

Draft Genome Sequence of the Plasmid-Free subsp. Strain LMG 19460.

Genome Announc. 2017 Apr 20;5(16):e00210-17. doi: 10.1128/genomeA.00210-17.

本文引用的文献

The MaSuRCA genome assembler.

Bioinformatics. 2013 Nov 1;29(21):2669-77. doi: 10.1093/bioinformatics/btt476. Epub 2013 Aug 29.

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.

Informed and automated k-mer size selection for genome assembly.

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

GAGE-B: an evaluation of genome assemblers for bacterial organisms.

Bioinformatics. 2013 Jul 15;29(14):1718-25. doi: 10.1093/bioinformatics/btt273. Epub 2013 May 10.

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.

QUAST: quality assessment tool for genome assemblies.

Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.

Current challenges in de novo plant genome sequencing and assembly.

Genome Biol. 2012;13(4):243. doi: 10.1186/gb4015.

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.

Plantagora: modeling whole genome sequencing and assembly of plant genomes.

PLoS One. 2011;6(12):e28436. doi: 10.1371/journal.pone.0028436. Epub 2011 Dec 12.

Efficient de novo assembly of large genomes using compressed data structures.

Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HGA：一种利用高覆盖度短测序读段进行细菌基因组从头组装的方法。

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献