Sena Francisco, Ingervo Eliel, Khan Shahbaz, Prjibelski Andrey, Schmidt Sebastian, Tomescu Alexandru
University of Helsinki, Helsinki, Finland.
Indian Institute of Technology Roorkee, Roorkee, India.
iScience. 2024 Oct 25;27(12):111208. doi: 10.1016/j.isci.2024.111208. eCollection 2024 Dec 20.
A of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization () of walks that are in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an ()-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.
网络流的一个流tig是一组加权路径,其叠加等于该流。在本文中,我们给出了在这种一般流分解中有效的路径的简单且可线性时间验证的完整刻画(),即任何可能的流分解的子路径。我们提供了一个()时间算法,该算法可识别所有最大流tig并将它们表示在一个紧凑结构中。在实际方面,我们在宏基因组组装的用例中研究流tig。通过将物种丰度用作宏基因组组装图的流值,我们可以将可能的组装解决方案建模为加权闭路径的流分解。在模拟数据上,与仅基于图结构报告单条序列或最大安全路径相比,报告流tig会产生明显更连续的组装。在真实数据上,我们将流tig构建为一种启发式方法,并提供一种受此启发式方法指导的算法。