修剪树组装器的评分与展开：概念、结构和比较。

Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons.

机构信息

Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.

出版信息

Bioinformatics. 2011 Jan 15;27(2):153-60. doi: 10.1093/bioinformatics/btq646. Epub 2010 Nov 18.

DOI:10.1093/bioinformatics/btq646

PMID:21088026

Abstract

MOTIVATION

Mired by its connection to a well-known -complete combinatorial optimization problem-namely, the Shortest Common Superstring Problem (SCSP)-historically, the whole-genome sequence assembly (WGSA) problem has been assumed to be amenable only to greedy and heuristic methods. By placing efficiency as their first priority, these methods opted to rely only on local searches, and are thus inherently approximate, ambiguous or error prone, especially, for genomes with complex structures. Furthermore, since choice of the best heuristics depended critically on the properties of (e.g. errors in) the input data and the available long range information, these approaches hindered designing an error free WGSA pipeline.

RESULTS

We dispense with the idea of limiting the solutions to just the approximated ones, and instead favor an approach that could potentially lead to an exhaustive (exponential-time) search of all possible layouts. Its computational complexity thus must be tamed through a constrained search (Branch-and-Bound) and quick identification and pruning of implausible overlays. For his purpose, such a method necessarily relies on a set of score functions (oracles) that can combine different structural properties (e.g. transitivity, coverage, physical maps, etc.). We give a detailed description of this novel assembly framework, referred to as Scoring-and-Unfolding Trimmed Tree Assembler (SUTTA), and present experimental results on several bacterial genomes using next-generation sequencing technology data. We also report experimental evidence that the assembly quality strongly depends on the choice of the minimum overlap parameter k.

AVAILABILITY AND IMPLEMENTATION

SUTTA's binaries are freely available to non-profit institutions for research and educational purposes at http://www.bioinformatics.nyu.edu.

摘要

动机

由于与一个众所周知的完全组合优化问题（即最短公共超字符串问题（SCSP））紧密相关，历史上，全基因组序列组装（WGSA）问题一直被认为只能采用贪婪和启发式方法。这些方法将效率作为首要任务，只选择进行局部搜索，因此本质上是近似的、模糊的或容易出错的，尤其是对于具有复杂结构的基因组。此外，由于最佳启发式的选择严重依赖于输入数据（例如错误）和可用长程信息的特性，因此这些方法阻碍了无错误 WGSA 管道的设计。

结果

我们摒弃了将解决方案限制在近似解的想法，而是倾向于采用一种可能导致所有可能布局的穷举（指数时间）搜索的方法。因此，其计算复杂度必须通过约束搜索（分支定界）和快速识别和修剪不合理的覆盖来进行控制。为此，这种方法必须依赖于一组评分函数（oracles），这些函数可以组合不同的结构特性（例如传递性、覆盖范围、物理图谱等）。我们详细描述了这种新的组装框架，称为评分和展开修剪树组装器（SUTTA），并使用下一代测序技术数据在几个细菌基因组上进行了实验结果展示。我们还报告了实验证据，表明组装质量强烈依赖于最小重叠参数 k 的选择。

可用性和实现

SUTTA 的二进制文件可供非营利机构免费用于研究和教育目的，可在 http://www.bioinformatics.nyu.edu 上获得。

相似文献

Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons.修剪树组装器的评分与展开：概念、结构和比较。

Bioinformatics. 2011 Jan 15;27(2):153-60. doi: 10.1093/bioinformatics/btq646. Epub 2010 Nov 18.

QuorUM: An Error Corrector for Illumina Reads.QuorUM：Illumina测序读数的纠错工具

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data.A5-miseq：一种用于从Illumina MiSeq数据组装微生物基因组的更新流程。

Bioinformatics. 2015 Feb 15;31(4):587-9. doi: 10.1093/bioinformatics/btu661. Epub 2014 Oct 22.

Gossamer--a resource-efficient de novo assembler.Gossamer--一种资源高效的从头组装程序。

Bioinformatics. 2012 Jul 15;28(14):1937-8. doi: 10.1093/bioinformatics/bts297. Epub 2012 May 18.

PE-Assembler: de novo assembler using short paired-end reads.PE-Assembler：使用短配对末端读取进行从头组装的程序。

Bioinformatics. 2011 Jan 15;27(2):167-74. doi: 10.1093/bioinformatics/btq626. Epub 2010 Dec 12.

Haplotype assembly in polyploid genomes and identical by descent shared tracts.多倍体基因组中的单体型组装和同源共享片段。

Bioinformatics. 2013 Jul 1;29(13):i352-60. doi: 10.1093/bioinformatics/btt213.

HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly.HyDA-Vista：迈向序列组装中k-mer大小的最优引导选择

BMC Genomics. 2014;15 Suppl 10(Suppl 10):S9. doi: 10.1186/1471-2164-15-S10-S9. Epub 2014 Dec 12.

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes.关于二倍体和多倍体基因组中单倍型组装的最小错误校正问题

J Comput Biol. 2016 Sep;23(9):718-36. doi: 10.1089/cmb.2015.0220. Epub 2016 Jun 9.

TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.TotalReCaller：通过集成的对准和碱基调用提高准确性和性能。

Bioinformatics. 2011 Sep 1;27(17):2330-7. doi: 10.1093/bioinformatics/btr393. Epub 2011 Jun 30.

BatMis: a fast algorithm for k-mismatch mapping.BatMis：一种快速的 k-错配映射算法。

Bioinformatics. 2012 Aug 15;28(16):2122-8. doi: 10.1093/bioinformatics/bts339. Epub 2012 Jun 10.

引用本文的文献

Facilitated sequence counting and assembly by template mutagenesis.通过模板诱变实现便捷的序列计数与组装。

Proc Natl Acad Sci U S A. 2014 Oct 28;111(43):E4632-7. doi: 10.1073/pnas.1416204111. Epub 2014 Oct 13.

Accurate de novo and transmitted indel detection in exome-capture data using microassembly.利用微组装技术对捕获外显子组数据进行精确的从头和传递插入缺失检测。

Nat Methods. 2014 Oct;11(10):1033-6. doi: 10.1038/nmeth.3069. Epub 2014 Aug 17.

Toward single-molecule optical mapping of the epigenome.朝向单分子光学表观基因组图谱绘制。

ACS Nano. 2014 Jan 28;8(1):14-26. doi: 10.1021/nn4050694. Epub 2013 Dec 20.

Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons.重新评估装配评估的特征响应曲线：GAGE 和装配竞赛。

PLoS One. 2012;7(12):e52210. doi: 10.1371/journal.pone.0052210. Epub 2012 Dec 28.

AGORA: Assembly Guided by Optical Restriction Alignment.AGORA：基于光学限制对齐的组装。

BMC Bioinformatics. 2012 Aug 2;13:189. doi: 10.1186/1471-2105-13-189.

Enzymatically incorporated genomic tags for optical mapping of DNA-binding proteins.用于DNA结合蛋白光学图谱分析的酶促整合基因组标签

Angew Chem Int Ed Engl. 2012 Apr 10;51(15):3578-81. doi: 10.1002/anie.201107714. Epub 2012 Feb 16.

Feature-by-feature--evaluating de novo sequence assembly.逐特征评估从头序列组装。

PLoS One. 2012;7(2):e31002. doi: 10.1371/journal.pone.0031002. Epub 2012 Feb 3.

Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies.鹰眼和 AMOS：可视化和评估基因组组装的质量。

Brief Bioinform. 2013 Mar;14(2):213-24. doi: 10.1093/bib/bbr074. Epub 2011 Dec 23.

Comparing de novo genome assembly: the long and short of it.从头开始比较基因组组装：长与短。

PLoS One. 2011 Apr 29;6(4):e19175. doi: 10.1371/journal.pone.0019175.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

修剪树组装器的评分与展开：概念、结构和比较。

Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献