• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过整数线性规划实现有效的最小流量分解。

Efficient Minimum Flow Decomposition via Integer Linear Programming.

机构信息

Department of Computer Science, University of Helsinki, Helsinki, Finland.

School of Computing, Montana State University, Bozeman, Montana, USA.

出版信息

J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.

DOI:10.1089/cmb.2022.0257
PMID:36260412
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9700332/
Abstract

Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of the exponentially many solution paths using only a number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.

摘要

最小流量分解(MFD)是一个 NP 难问题,要求将网络流量分解为最小的路径集(以及相关的权重)。它的变体是生物信息学中多组装问题的强大模型,如 RNA 组装。由于其难度,实际的多组装工具要么使用启发式方法,要么解决问题的更简单、多项式时间可解的版本,这可能导致非最小或不完全分解流量的解决方案。在这里,我们提供了第一个基于整数线性规划(ILP)的无环流量网络上的 MFD 快速精确求解器。我们方法的关键是使用仅少数变量对指数数量的解路径进行编码。我们还将我们的 ILP 公式扩展到许多实际变体,例如合并更长或配对末端读取,或最小化流量误差。在模拟和真实流量拼接图上,我们的方法在 <13 秒内解决了实例。我们希望我们的配方可以成为未来实用的 RNA 组装工具的核心。我们的实现可以在 Github 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/d84899043bb2/cmb.2022.0257_figure3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/929ca328f369/cmb.2022.0257_figure1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/2a15945395e7/cmb.2022.0257_figure2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/d84899043bb2/cmb.2022.0257_figure3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/929ca328f369/cmb.2022.0257_figure1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/2a15945395e7/cmb.2022.0257_figure2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7265/9700332/d84899043bb2/cmb.2022.0257_figure3.jpg

相似文献

1
Efficient Minimum Flow Decomposition via Integer Linear Programming.通过整数线性规划实现有效的最小流量分解。
J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.
2
A safety framework for flow decomposition problems via integer linear programming.通过整数线性规划对流量分解问题进行安全框架构建。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.
3
Accurate Flow Decomposition via Robust Integer Linear Programming.通过稳健整数线性规划实现精确流分解
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1955-1964. doi: 10.1109/TCBB.2024.3433523. Epub 2024 Dec 10.
4
Flow Decomposition With Subpath Constraints.具有子路径约束的流分解
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):360-370. doi: 10.1109/TCBB.2022.3147697. Epub 2023 Feb 3.
5
Better ILP models for haplotype assembly.更好的用于单体型组装的 ILP 模型。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):52. doi: 10.1186/s12859-018-2012-x.
6
Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming.染色体结构:将某些具有不等基因含量和基因旁系同源物的问题简化为整数线性规划。
BMC Bioinformatics. 2017 Dec 6;18(1):537. doi: 10.1186/s12859-017-1944-x.
7
MultiTrans: An Algorithm for Path Extraction Through Mixed Integer Linear Programming for Transcriptome Assembly.MultiTrans:一种通过混合整数线性规划进行转录组组装的路径提取算法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):48-56. doi: 10.1109/TCBB.2021.3083277. Epub 2022 Feb 3.
8
An Integer Programming Formulation of the Minimum Common String Partition Problem.最小公共字符串划分问题的整数规划公式化表述。
PLoS One. 2015 Jul 2;10(7):e0130266. doi: 10.1371/journal.pone.0130266. eCollection 2015.
9
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.重新审视图遍历编辑距离及其变体的复杂性和算法。
Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6.
10
Improving RNA Assembly via Safety and Completeness in Flow Decompositions.通过流分解中的安全性和完整性提高 RNA 组装
J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.

引用本文的文献

1
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.重新审视图遍历编辑距离及其变体的复杂性和算法。
Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6.
2
Transcript assembly and annotations: Bias and adjustment.转录本组装和注释:偏差与调整。
PLoS Comput Biol. 2023 Dec 21;19(12):e1011734. doi: 10.1371/journal.pcbi.1011734. eCollection 2023 Dec.
3
Phables: from fragmented assemblies to high-quality bacteriophage genomes.噬菌体:从碎片化组装到高质量噬菌体基因组。

本文引用的文献

1
Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability.由于不可识别性导致的最优转录本表达范围的推导。
J Comput Biol. 2022 Feb;29(2):121-139. doi: 10.1089/cmb.2021.0444. Epub 2022 Jan 17.
2
Jumper enables discontinuous transcript assembly in coronaviruses.跳跃基因使冠状病毒的不连续转录组装成为可能。
Nat Commun. 2021 Nov 18;12(1):6728. doi: 10.1038/s41467-021-26944-y.
3
MultiTrans: An Algorithm for Path Extraction Through Mixed Integer Linear Programming for Transcriptome Assembly.MultiTrans:一种通过混合整数线性规划进行转录组组装的路径提取算法。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad586.
4
Phables: from fragmented assemblies to high-quality bacteriophage genomes.噬菌体组装拼接软件(Phables):从片段化组装到高质量噬菌体基因组
bioRxiv. 2023 Sep 11:2023.04.04.535632. doi: 10.1101/2023.04.04.535632.
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):48-56. doi: 10.1109/TCBB.2021.3083277. Epub 2022 Feb 3.
4
Long-read transcriptome sequencing reveals abundant promoter diversity in distinct molecular subtypes of gastric cancer.长读转录组测序揭示了不同分子亚型胃癌中丰富的启动子多样性。
Genome Biol. 2021 Jan 22;22(1):44. doi: 10.1186/s13059-021-02261-x.
5
V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data.V-pipe:一种用于从高通量数据评估病毒遗传多样性的计算流程。
Bioinformatics. 2021 Jul 19;37(12):1673-1680. doi: 10.1093/bioinformatics/btab015.
6
Single-cell RNA counting at allele and isoform resolution using Smart-seq3.基于 Smart-seq3 技术进行等位基因和异构体分辨率的单细胞 RNA 计数
Nat Biotechnol. 2020 Jun;38(6):708-714. doi: 10.1038/s41587-020-0497-0. Epub 2020 May 4.
7
RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition.RefShannon:一种基于基因组指导的使用稀疏流分解的转录组组装方法。
PLoS One. 2020 Jun 2;15(6):e0232946. doi: 10.1371/journal.pone.0232946. eCollection 2020.
8
Opportunities and challenges in long-read sequencing data analysis.长读测序数据分析中的机遇与挑战。
Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.
9
Transcriptome assembly from long-read RNA-seq alignments with StringTie2.基于长读 RNA-seq 比对的转录组组装与 StringTie2。
Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1.
10
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.基于图的基因组比对和基因分型与 HISAT2 和 HISAT-genotype。
Nat Biotechnol. 2019 Aug;37(8):907-915. doi: 10.1038/s41587-019-0201-4. Epub 2019 Aug 2.