SEQuel：提高基因组组装的准确性。

SEQuel: improving the accuracy of genome assemblies.

机构信息

Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA.

出版信息

Bioinformatics. 2012 Jun 15;28(12):i188-96. doi: 10.1093/bioinformatics/bts219.

DOI:10.1093/bioinformatics/bts219

PMID:22689760

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3371851/

Abstract

MOTIVATION

Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model.

RESULTS

SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly.

AVAILABILITY

SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.

摘要

动机

尽管下一代测序 (NGS) 数据的组装结果很准确，但仍包含大量需要在组装过程后进行纠正的错误。我们开发了 SEQuel，这是一种可纠正组装后重叠群中错误（即插入、缺失和替换错误）的工具。SEQuel 背后的算法的基础是位置 de Bruijn 图，这是一种在读取内容中对 k-mers 进行建模的图结构，同时将读取的近似位置纳入模型中。

结果

SEQuel 将标准多细胞大肠杆菌数据的组装中较小的插入和缺失数量减少了近一半，并纠正了 30%至 94%的替换错误。此外，我们还表明，SEQuel 对于改进单细胞组装至关重要，因为其错误率更高且覆盖不均匀，因此本身更具挑战性；单细胞组装中的一半以上的小插入和替换错误都得到了纠正。我们将 SEQuel 应用于最近组装的δ变形菌 SAR324 基因组，这是第一个具有全面单细胞基因组组装的细菌基因组，并对其进行了 800 多次修改（插入、缺失和替换）以完善该组装。

可用性

SEQuel 可以作为任何 NGS 组装器的后处理步骤使用，可在 http://bix.ucsd.edu/SEQuel/ 免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58a8/3371851/5acab99b7093/bts219f1.jpg

相似文献

SEQuel: improving the accuracy of genome assemblies.SEQuel：提高基因组组装的准确性。

Bioinformatics. 2012 Jun 15;28(12):i188-96. doi: 10.1093/bioinformatics/bts219.

ARAMIS: From systematic errors of NGS long reads to accurate assemblies.ARAMIS：从 NGS 长读的系统误差到精确组装。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.

Evaluation of short read metagenomic assembly.短读宏基因组组装评估。

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致（OLC）方法的最佳性能。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch：一种基于草图的快速基因组装配器。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.配对德布鲁因图：一种将配对末端信息整合到基因组组装工具中的新方法。

J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14.

QuorUM: An Error Corrector for Illumina Reads.QuorUM：Illumina测序读数的纠错工具

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

Blue: correcting sequencing errors using consensus and context.蓝色：使用一致性和上下文来纠正测序错误。

Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.

Misassembly detection using paired-end sequence reads and optical mapping data.使用配对末端序列读数和光学作图数据进行错误组装检测。

Bioinformatics. 2015 Jun 15;31(12):i80-8. doi: 10.1093/bioinformatics/btv262.

Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data.Karect：对下一代测序数据中的替换、插入和缺失错误进行精确校正。

Bioinformatics. 2015 Nov 1;31(21):3421-8. doi: 10.1093/bioinformatics/btv415. Epub 2015 Jul 14.

引用本文的文献

Dysgu: efficient structural variant calling using short or long reads.Dysgu：使用短读长读进行高效的结构变异调用。

Nucleic Acids Res. 2022 May 20;50(9):e53. doi: 10.1093/nar/gkac039.

Buffering updates enables efficient dynamic de Bruijn graphs.缓冲更新可实现高效的动态德布鲁因图。

Comput Struct Biotechnol J. 2021 Jul 6;19:4067-4078. doi: 10.1016/j.csbj.2021.06.047. eCollection 2021.

Genome skimming and exploration of DNA barcodes for Taiwan endemic cypresses.基因组扫描和 DNA 条形码在台湾特有柏科植物中的探索。

Sci Rep. 2020 Nov 26;10(1):20650. doi: 10.1038/s41598-020-77492-2.

Graph Traversal Edit Distance and Extensions.图遍历编辑距离及其扩展

J Comput Biol. 2020 Mar;27(3):317-329. doi: 10.1089/cmb.2019.0511. Epub 2020 Feb 13.

Mitogenome analysis of dwarf pufferfish () endemic to southwest India and its implications in the phylogeny of Tetraodontidae.印度西南部特有侏儒河豚（）的线粒体基因组分析及其在四齿鲀科系统发育中的意义。

J Genet. 2019 Dec;98.

Comparative analysis of corrected tiger genome provides clues to its neuronal evolution.校正后的老虎基因组比较分析为其神经元进化提供了线索。

Sci Rep. 2019 Dec 5;9(1):18459. doi: 10.1038/s41598-019-54838-z.

Draft genome sequence data of T-5 like bacteriophage ФSP3 with demonstrated therapeutic potential.具有已证实治疗潜力的类T-5噬菌体ФSP3的基因组序列草案数据。

Data Brief. 2019 Oct 4;27:104606. doi: 10.1016/j.dib.2019.104606. eCollection 2019 Dec.

Complete Assembly of the Genome of an Strain Reveals a Naturally Occurring Plasmid in This Species.一株菌株基因组的完整组装揭示了该物种中一种天然存在的质粒。

Front Microbiol. 2019 Jun 20;10:1400. doi: 10.3389/fmicb.2019.01400. eCollection 2019.

Size does matter: Parallel evolution of adaptive thermal tolerance and body size facilitates adaptation to climate change in domestic cattle.体型很重要：适应性热耐受性与体型的平行进化有助于家牛适应气候变化。

Ecol Evol. 2018 Oct 5;8(21):10608-10620. doi: 10.1002/ece3.4550. eCollection 2018 Nov.

Conversion of Methionine to Cysteine in Depends on the Highly Mobile Gene Cluster.蛋氨酸向半胱氨酸的转化取决于高度可移动的基因簇。

Front Microbiol. 2018 Oct 17;9:2415. doi: 10.3389/fmicb.2018.02415. eCollection 2018.

本文引用的文献

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.SPAdes：一种新的基因组组装算法及其在单细胞测序中的应用

J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.

How to apply de Bruijn graphs to genome assembly.如何将德布鲁因图应用于基因组组装。

Nat Biotechnol. 2011 Nov 8;29(11):987-91. doi: 10.1038/nbt.2023.

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.基于短读长数据集的高效从头组装单细胞细菌基因组。

Nat Biotechnol. 2011 Sep 18;29(10):915-21. doi: 10.1038/nbt.1966.

LOCAS--a low coverage assembly tool for resequencing projects.LOCAS--用于重测序项目的低覆盖度组装工具。

PLoS One. 2011;6(8):e23455. doi: 10.1371/journal.pone.0023455. Epub 2011 Aug 15.

Error correction of high-throughput sequencing datasets with non-uniform coverage.利用非均匀覆盖的高通量测序数据集进行纠错。

Bioinformatics. 2011 Jul 1;27(13):i137-41. doi: 10.1093/bioinformatics/btr208.

A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

Creating a buzz about insect genomes.引发对昆虫基因组的热议。

Science. 2011 Mar 18;331(6023):1386. doi: 10.1126/science.331.6023.1386.

Quake: quality-aware detection and correction of sequencing errors.Quake：测序错误的质量感知检测和校正。

Genome Biol. 2010;11(11):R116. doi: 10.1186/gb-2010-11-11-r116. Epub 2010 Nov 29.

Limitations of next-generation genome sequence assembly.下一代基因组序列组装的局限性。

Nat Methods. 2011 Jan;8(1):61-5. doi: 10.1038/nmeth.1527. Epub 2010 Nov 21.

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包：一种用于分析下一代 DNA 测序数据的 MapReduce 框架。

Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SEQuel：提高基因组组装的准确性。

SEQuel: improving the accuracy of genome assemblies.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献