Department of Computer Science, University of Helsinki, Helsinki, Finland.
School of Computing, Montana State University, Bozeman, Montana, USA.
J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.
Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of the exponentially many solution paths using only a number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.
最小流量分解(MFD)是一个 NP 难问题,要求将网络流量分解为最小的路径集(以及相关的权重)。它的变体是生物信息学中多组装问题的强大模型,如 RNA 组装。由于其难度,实际的多组装工具要么使用启发式方法,要么解决问题的更简单、多项式时间可解的版本,这可能导致非最小或不完全分解流量的解决方案。在这里,我们提供了第一个基于整数线性规划(ILP)的无环流量网络上的 MFD 快速精确求解器。我们方法的关键是使用仅少数变量对指数数量的解路径进行编码。我们还将我们的 ILP 公式扩展到许多实际变体,例如合并更长或配对末端读取,或最小化流量误差。在模拟和真实流量拼接图上,我们的方法在 <13 秒内解决了实例。我们希望我们的配方可以成为未来实用的 RNA 组装工具的核心。我们的实现可以在 Github 上免费获得。