BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S5. doi: 10.1186/1471-2105-15-S9-S5. Epub 2014 Sep 10.
Multi-assembly problems have gathered much attention in the last years, as Next-Generation Sequencing technologies have started being applied to mixed settings, such as reads from the transcriptome (RNA-Seq), or from viral quasi-species. One classical model that has resurfaced in many multi-assembly methods (e.g. in Cufflinks, ShoRAH, BRANCH, CLASS) is the Minimum Path Cover (MPC) Problem, which asks for the minimum number of directed paths that cover all the nodes of a directed acyclic graph. The MPC Problem is highly popular because the acyclicity of the graph ensures its polynomial-time solvability.
In this paper, we consider two generalizations of it dealing with integrating constraints arising from long reads or paired-end reads; these extensions have also been considered by two recent methods, but not fully solved. More specifically, we study the two problems where also a set of subpaths, or pairs of subpaths, of the graph have to be entirely covered by some path in the MPC. We show that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem. We also consider the weighted case, and show that it can be solved in polynomial-time by a reduction to a min-cost circulation problem. As a side result, we also improve the time complexity of the classical minimum weight MPC Problem. In the case of paired-end reads (pairs of subpaths), the generalized problem becomes NP-hard, but we show that it is fixed-parameter tractable (FPT) in the total number of constraints. This computational dichotomy between long reads and paired-end reads is also a general insight into multi-assembly problems.
随着下一代测序技术开始应用于混合环境,如转录组(RNA-Seq)或病毒准种的读取,多组装问题近年来受到了广泛关注。在许多多组装方法中重新出现的一个经典模型是最小路径覆盖(MPC)问题,它要求覆盖有向无环图所有节点的有向路径的最小数量。MPC 问题非常流行,因为图的无环性确保了其多项式时间可解性。
在本文中,我们考虑了两种广义化的问题,这些问题涉及到整合长读或配对末端读产生的约束;这两种扩展也被最近的两种方法考虑到,但没有完全解决。具体来说,我们研究了两个问题,其中还需要通过 MPC 中的某些路径完全覆盖图的一组子路径或对子路径。我们表明,在长读(子路径)的情况下,通过将广义问题归约为经典 MPC 问题,可以在多项式时间内解决。我们还考虑了加权情况,并表明通过归约到最小成本循环问题可以在多项式时间内解决。作为一个附带的结果,我们还改进了经典最小权 MPC 问题的时间复杂度。在配对末端读(对子路径)的情况下,广义问题变得 NP 难,但我们表明它在约束总数上是固定参数可解的(FPT)。这种长读和配对末端读之间的计算二分法也是对多组装问题的一个普遍见解。