Dias Fernando H C, Tomescu Alexandru I
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1955-1964. doi: 10.1109/TCBB.2024.3433523. Epub 2024 Dec 10.
Minimum flow decomposition (MFD) is a common problem across various fields of Computer Science, where a flow is decomposed into a minimum set of weighted paths. However, in Bioinformatics applications, such as RNA transcript or quasi-species assembly, the flow is erroneous since it is obtained from noisy read coverages. Typical generalizations of the MFD problem to handle errors are based on least-squares formulations or modelling the erroneous flow values as ranges. All of these are thus focused on error handling at the level of individual edges. In this paper, we interpret the flow decomposition problem as a robust optimization problem and lift error-handling from individual edges to solution paths. As such, we introduce a new minimum path-error flow decomposition problem, for which we give an Integer Linear Programming formulation. Our experimental results reveal that our formulation can account for errors significantly better, by lowering the inaccuracy rate by 30-50% compared to previous error-handling formulations, with computational requirements that remain practical.
最小流分解(MFD)是计算机科学各个领域中一个常见的问题,其中流被分解为一组最小的加权路径。然而,在生物信息学应用中,例如RNA转录本或准物种组装,由于流是从有噪声的读覆盖中获得的,所以它是错误的。将MFD问题处理错误的典型推广基于最小二乘公式或将错误的流值建模为范围。因此,所有这些都集中在单个边的错误处理上。在本文中,我们将流分解问题解释为一个鲁棒优化问题,并将错误处理从单个边提升到求解路径。因此,我们引入了一个新的最小路径误差流分解问题,并给出了其整数线性规划公式。我们的实验结果表明,与以前的错误处理公式相比,我们的公式能够显著更好地处理错误,将不准确率降低30%-50%,同时计算要求仍然可行。