快速搭建小型独立混合整数规划。

Fast scaffolding with small independent mixed integer programs.

机构信息

Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland.

出版信息

Bioinformatics. 2011 Dec 1;27(23):3259-65. doi: 10.1093/bioinformatics/btr562. Epub 2011 Oct 13.

DOI:10.1093/bioinformatics/btr562

PMID:21998153

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3223363/

Abstract

MOTIVATION

Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data.

RESULTS

We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes.

AVAILABILITY

The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/.

CONTACT

leena.salmela@cs.helsinki.fi.

摘要

动机

从短读数据组装基因组已变得越来越流行，但该问题在计算上仍然具有挑战性，尤其是对于较大的基因组。我们研究序列组装的支架阶段，其中根据配对数据对预组装的 contigs 进行排序。

结果

我们提出了 MIP Scaffolder，它将支架问题分解为更小的子问题，并使用混合整数规划来解决这些问题。支架问题可以表示为一个图，并且可以独立地解决该图的双连通分量。我们提出了一种限制这些子问题大小的技术，以便可以使用混合整数规划准确地解决这些子问题。我们将 MIP Scaffolder 与两种最先进的方法 SOPRA 和 SSPACE 进行了比较。MIP Scaffolder 速度快，并且在大型基因组上的表现与竞争对手一样好或更好。

可用性

MIP Scaffolder 的源代码可在 http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/ 上免费获得。

联系人

leena.salmela@cs.helsinki.fi。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c0/3223363/e506b5685fca/btr562f1.jpg

相似文献

Fast scaffolding with small independent mixed integer programs.

Bioinformatics. 2011 Dec 1;27(23):3259-65. doi: 10.1093/bioinformatics/btr562. Epub 2011 Oct 13.

GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.

Bioinformatics. 2012 Jun 1;28(11):1429-37. doi: 10.1093/bioinformatics/bts175. Epub 2012 Apr 6.

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads.

BMC Genomics. 2018 Mar 2;19(1):175. doi: 10.1186/s12864-018-4567-3.

SCARPA: scaffolding reads with practical algorithms.

Bioinformatics. 2013 Feb 15;29(4):428-34. doi: 10.1093/bioinformatics/bts716. Epub 2012 Dec 29.

Scaffolding pre-assembled contigs using SSPACE.

Bioinformatics. 2011 Feb 15;27(4):578-9. doi: 10.1093/bioinformatics/btq683. Epub 2010 Dec 12.

PEP_scaffolder: using (homologous) proteins to scaffold genomes.

Bioinformatics. 2016 Oct 15;32(20):3193-3195. doi: 10.1093/bioinformatics/btw378. Epub 2016 Jun 22.

Accurate self-correction of errors in long reads using de Bruijn graphs.

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

Assembly scaffolding with PE-contaminated mate-pair libraries.

Bioinformatics. 2016 Jul 1;32(13):1925-32. doi: 10.1093/bioinformatics/btw064. Epub 2016 Mar 2.

L_RNA_scaffolder: scaffolding genomes with transcripts.

BMC Genomics. 2013 Sep 8;14:604. doi: 10.1186/1471-2164-14-604.

ILP-based maximum likelihood genome scaffolding.

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-15-S9-S9. Epub 2014 Sep 10.

引用本文的文献

Maptcha: an efficient parallel workflow for hybrid genome scaffolding.

BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.

Graph-based self-supervised learning for repeat detection in metagenomic assembly.

Genome Res. 2024 Oct 11;34(9):1468-1476. doi: 10.1101/gr.279136.124.

Global exact optimisations for chloroplast structural haplotype scaffolding.

Algorithms Mol Biol. 2024 Feb 6;19(1):5. doi: 10.1186/s13015-023-00243-1.

SWALO: scaffolding with assembly likelihood optimization.

Nucleic Acids Res. 2021 Nov 18;49(20):e117. doi: 10.1093/nar/gkab717.

Draft Genome Sequence of the Non-Microcystin-Producing Microcystis aeruginosa Strain KLA2, Isolated from a Freshwater Reservoir in Northern California, USA.

Microbiol Resour Announc. 2020 Jan 16;9(3):e01086-19. doi: 10.1128/MRA.01086-19.

LRScaf: improving draft genomes using long noisy reads.

BMC Genomics. 2019 Dec 9;20(1):955. doi: 10.1186/s12864-019-6337-2.

Modern technologies and algorithms for scaffolding assembled genomes.

PLoS Comput Biol. 2019 Jun 5;15(6):e1006994. doi: 10.1371/journal.pcbi.1006994. eCollection 2019 Jun.

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads.

BMC Genomics. 2018 Mar 2;19(1):175. doi: 10.1186/s12864-018-4567-3.

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

Brief Bioinform. 2019 Jul 19;20(4):1140-1150. doi: 10.1093/bib/bbx098.

Approaches for in silico finishing of microbial genome sequences.

Genet Mol Biol. 2017;40(3):553-576. doi: 10.1590/1678-4685-GMB-2016-0230.

本文引用的文献

High-quality draft assemblies of mammalian genomes from massively parallel sequence data.

Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27.

Scaffolding pre-assembled contigs using SSPACE.

Bioinformatics. 2011 Feb 15;27(4):578-9. doi: 10.1093/bioinformatics/btq683. Epub 2010 Dec 12.

SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

BMC Bioinformatics. 2010 Jun 24;11:345. doi: 10.1186/1471-2105-11-345.

De novo assembly of human genomes with massively parallel short read sequencing.

Genome Res. 2010 Feb;20(2):265-72. doi: 10.1101/gr.097261.109. Epub 2009 Dec 17.

The Sequence Alignment/Map format and SAMtools.

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

SOAP2: an improved ultrafast tool for short read alignment.

Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.

Genome assembly reborn: recent computational challenges.

Brief Bioinform. 2009 Jul;10(4):354-66. doi: 10.1093/bib/bbp026. Epub 2009 May 29.

ABySS: a parallel assembler for short read sequence data.

Genome Res. 2009 Jun;19(6):1117-23. doi: 10.1101/gr.089532.108. Epub 2009 Feb 27.

Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Genome Res. 2008 May;18(5):821-9. doi: 10.1101/gr.074492.107. Epub 2008 Mar 18.

ALLPATHS: de novo assembly of whole-genome shotgun microreads.

Genome Res. 2008 May;18(5):810-20. doi: 10.1101/gr.7337908. Epub 2008 Mar 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

快速搭建小型独立混合整数规划。

Fast scaffolding with small independent mixed integer programs.

机构信息

Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland.