Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland.
Bioinformatics. 2011 Dec 1;27(23):3259-65. doi: 10.1093/bioinformatics/btr562. Epub 2011 Oct 13.
Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data.
We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes.
The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/.
从短读数据组装基因组已变得越来越流行,但该问题在计算上仍然具有挑战性,尤其是对于较大的基因组。我们研究序列组装的支架阶段,其中根据配对数据对预组装的 contigs 进行排序。
我们提出了 MIP Scaffolder,它将支架问题分解为更小的子问题,并使用混合整数规划来解决这些问题。支架问题可以表示为一个图,并且可以独立地解决该图的双连通分量。我们提出了一种限制这些子问题大小的技术,以便可以使用混合整数规划准确地解决这些子问题。我们将 MIP Scaffolder 与两种最先进的方法 SOPRA 和 SSPACE 进行了比较。MIP Scaffolder 速度快,并且在大型基因组上的表现与竞争对手一样好或更好。
MIP Scaffolder 的源代码可在 http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/ 上免费获得。