最小路径流分解问题的理论与启发式方法。

Theory and A Heuristic for the Minimum Path Flow Decomposition Problem.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):658-670. doi: 10.1109/TCBB.2017.2779509. Epub 2017 Dec 4.

DOI:10.1109/TCBB.2017.2779509

Abstract

Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph $G=(V,E)$G=(V,E) with source $s$s and sink $t$t and a flow $f$f, compute a set of $s$s-$t$t paths $P$P and assign weight $w(p)$w(p) for $p\in P$p∈P such that $f(e) = \sum _{p\in P: e\in p} w(p)$f(e)=∑p∈P:e∈pw(p), $\forall e\in E$∀e∈E, and $|P|$|P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: to iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.

摘要

受多个基因组组装问题和其他应用的启发，我们研究了以下最小路径流分解问题：给定一个有源点 $s$ 和汇点 $t$ 的有向无环图 $G=(V,E)$ 和流 $f$，计算一组 $s$-$t$ 路径 $P$ 并为 $p\in P$ 分配权重 $w(p)$，使得 $f(e) = \sum _{p\in P: e\in p} w(p)$，$\forall e\in E$，并且 $|P|$ 最小化。我们为这个问题发展了一些基本理论，并在此基础上设计了一个有效的启发式算法。具体来说，我们证明了最优路径数和已知上界之间的差距是由流值中的非平凡方程决定的。这个结果为我们的启发式算法提供了框架：通过识别这些方程来迭代地缩小差距。我们还定义了图的某些独立子结构上的一个操作，并证明这个操作不会影响最优性，但可以将图转化为具有所需性质的图，从而便于缩小差距。我们在模拟随机实例和完美拼接图实例上应用和测试了我们的算法，并将其与现有的流分解的最先进算法进行了比较。结果表明，我们的算法在这些实例上可以达到非常高的准确性，并且显著优于以前的算法。我们的算法的实现可在 https://github.com/Kingsford-Group/catfish 上免费获得。

相似文献

Theory and A Heuristic for the Minimum Path Flow Decomposition Problem.最小路径流分解问题的理论与启发式方法。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):658-670. doi: 10.1109/TCBB.2017.2779509. Epub 2017 Dec 4.

Improving RNA Assembly via Safety and Completeness in Flow Decompositions.通过流分解中的安全性和完整性提高 RNA 组装

J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.

Flow Decomposition With Subpath Constraints.具有子路径约束的流分解

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):360-370. doi: 10.1109/TCBB.2022.3147697. Epub 2023 Feb 3.

Bit-parallel sequence-to-graph alignment.位并行序列到图的对齐。

Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162.

Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.草莓：基于RNA测序的快速且准确的基因组引导转录本重建与定量分析

PLoS Comput Biol. 2017 Nov 27;13(11):e1005851. doi: 10.1371/journal.pcbi.1005851. eCollection 2017 Nov.

Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs.循环无亲缘基因组图谱的缝隙敏感共线性链接算法。

J Comput Biol. 2023 Nov;30(11):1182-1197. doi: 10.1089/cmb.2023.0186. Epub 2023 Oct 30.

Path matching and graph matching in biological networks.生物网络中的路径匹配和图匹配

J Comput Biol. 2007 Jan-Feb;14(1):56-67. doi: 10.1089/cmb.2006.0076.

Safely Filling Gaps with Partial Solutions Common to All Solutions.用所有解决方案共有的部分解决方案安全填补空白。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):617-626. doi: 10.1109/TCBB.2017.2785831. Epub 2018 Jan 15.

Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data.用于促进从RNA测序数据中同时发现相关生物体中转录本的de Bruijn图启发式成对比对。

BMC Genomics. 2015;16 Suppl 11(Suppl 11):S5. doi: 10.1186/1471-2164-16-S11-S5. Epub 2015 Nov 10.

The median problems on linear multichromosomal genomes: graph representation and fast exact solutions.线性多染色体基因组上的中位数问题：图形表示与快速精确解

J Comput Biol. 2010 Sep;17(9):1195-211. doi: 10.1089/cmb.2010.0106.

引用本文的文献

Floria: fast and accurate strain haplotyping in metagenomes.弗洛里亚：宏基因组中快速准确的菌株单倍型分型。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i30-i38. doi: 10.1093/bioinformatics/btae252.

Accurate assembly of multiple RNA-seq samples with Aletsch.利用 Aletsch 对多个 RNA-seq 样本进行精确组装。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i307-i317. doi: 10.1093/bioinformatics/btae215.

Transcript assembly and annotations: Bias and adjustment.转录本组装和注释：偏差与调整。

PLoS Comput Biol. 2023 Dec 21;19(12):e1011734. doi: 10.1371/journal.pcbi.1011734. eCollection 2023 Dec.

A safety framework for flow decomposition problems via integer linear programming.通过整数线性规划对流量分解问题进行安全框架构建。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.

Accurate assembly of multi-end RNA-seq data with Scallop2.使用Scallop2对多端RNA测序数据进行精确组装。

Nat Comput Sci. 2022 Mar;2(3):148-152. doi: 10.1038/s43588-022-00216-1. Epub 2022 Mar 28.

Improving RNA Assembly via Safety and Completeness in Flow Decompositions.通过流分解中的安全性和完整性提高 RNA 组装

J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.

Efficient Minimum Flow Decomposition via Integer Linear Programming.通过整数线性规划实现有效的最小流量分解。

J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.

Exact transcript quantification over splice graphs.通过剪接图进行精确的转录本定量分析。

Algorithms Mol Biol. 2021 May 10;16(1):5. doi: 10.1186/s13015-021-00184-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

最小路径流分解问题的理论与启发式方法。

Theory and A Heuristic for the Minimum Path Flow Decomposition Problem.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献