• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过整数线性规划对流量分解问题进行安全框架构建。

A safety framework for flow decomposition problems via integer linear programming.

机构信息

Department of Computer Science, University of Helsinki, Helsinki 00560, Finland.

School of Computing, Montana State University, Bozeman, MT 59717, United States.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.

DOI:10.1093/bioinformatics/btad640
PMID:37862229
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10628435/
Abstract

MOTIVATION

Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding "safe" partial solutions (e.g. contigs) which are common to all solutions. Previous research on safety has focused on polynomially time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of "safety tools" for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, "minimum flow decomposition" (MFD). We obtain our results by developing a "safety test" for paths based on a general integer linear programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure.

RESULTS

Experimental results on transcriptome datasets show that all safe paths for MFDs correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27 000 non-trivial graphs of this dataset in only 1.5 h. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem.

AVAILABILITY AND IMPLEMENTATION

https://github.com/algbio/mfd-safety.

摘要

动机

生物信息学中的许多重要问题(例如组装或多组装)都有多个解决方案,而最终的目标是只报告一个解决方案。处理这种不确定性的一种常见方法是找到所有解决方案共有的“安全”部分解决方案(例如,连续体)。以前对安全性的研究集中在多项式时间可解的问题上,而许多成功的自然模型难以解决 NP 难题,导致缺乏此类问题的“安全工具”。我们提出了一种针对 NP 难题“最小流分解”(MFD)的计算所有安全解决方案的方法。我们通过开发一种基于通用整数线性规划(ILP)公式的路径“安全测试”来获得结果。此外,我们还提供了具有实际优化的实现,旨在减少总 ILP 时间,其中最有效的实现是基于递归分组测试过程。

结果

在转录组数据集上的实验结果表明,MFD 的所有安全路径正确地恢复了高达 90%的完整 RNA 转录本,比以前已知的安全路径至少多 25%。此外,尽管该问题具有 NP 难度,但我们可以在仅 1.5 小时内报告该数据集超过 27,000 个非平凡图的 99.8%的所有安全路径。我们的结果表明,在完美的数据中,RNA 组装问题的歧义性比想象的要小。

可用性和实现

https://github.com/algbio/mfd-safety。

相似文献

1
A safety framework for flow decomposition problems via integer linear programming.通过整数线性规划对流量分解问题进行安全框架构建。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.
2
Efficient Minimum Flow Decomposition via Integer Linear Programming.通过整数线性规划实现有效的最小流量分解。
J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.
3
Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming.染色体结构:将某些具有不等基因含量和基因旁系同源物的问题简化为整数线性规划。
BMC Bioinformatics. 2017 Dec 6;18(1):537. doi: 10.1186/s12859-017-1944-x.
4
An Integer Programming Formulation of the Minimum Common String Partition Problem.最小公共字符串划分问题的整数规划公式化表述。
PLoS One. 2015 Jul 2;10(7):e0130266. doi: 10.1371/journal.pone.0130266. eCollection 2015.
5
Improving RNA Assembly via Safety and Completeness in Flow Decompositions.通过流分解中的安全性和完整性提高 RNA 组装
J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.
6
A unified ILP framework for core ancestral genome reconstruction problems.用于核心祖先基因组重建问题的统一 ILP 框架。
Bioinformatics. 2020 May 1;36(10):2993-3003. doi: 10.1093/bioinformatics/btaa100.
7
Accurate Flow Decomposition via Robust Integer Linear Programming.通过稳健整数线性规划实现精确流分解
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1955-1964. doi: 10.1109/TCBB.2024.3433523. Epub 2024 Dec 10.
8
The Multi-State Perfect Phylogeny Problem with missing and removable data: solutions via integer-programming and chordal graph theory.存在缺失和可移除数据的多状态完美系统发育问题:通过整数规划和弦图理论求解
J Comput Biol. 2010 Mar;17(3):383-99. doi: 10.1089/cmb.2009.0200.
9
Computing the minimum recombinant haplotype configuration from incomplete genotype data on a pedigree by integer linear programming.通过整数线性规划从家系的不完整基因型数据计算最小重组单倍型构型。
J Comput Biol. 2005 Jul-Aug;12(6):719-39. doi: 10.1089/cmb.2005.12.719.
10
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.重新审视图遍历编辑距离及其变体的复杂性和算法。
Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6.

本文引用的文献

1
Improving RNA Assembly via Safety and Completeness in Flow Decompositions.通过流分解中的安全性和完整性提高 RNA 组装
J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.
2
Assembler artifacts include misassembly because of unsafe unitigs and underassembly because of bidirected graphs.组装体伪影包括由于不安全的单元而导致的组装错误,以及由于双向图而导致的组装不足。
Genome Res. 2022 Sep 27;32(9):1746-1753. doi: 10.1101/gr.276601.122.
3
Flow Decomposition With Subpath Constraints.具有子路径约束的流分解
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):360-370. doi: 10.1109/TCBB.2022.3147697. Epub 2023 Feb 3.
4
Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability.由于不可识别性导致的最优转录本表达范围的推导。
J Comput Biol. 2022 Feb;29(2):121-139. doi: 10.1089/cmb.2021.0444. Epub 2022 Jan 17.
5
Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.通过有向无环图(DAG)的所有路径覆盖中出现的路径实现多组件中的安全性。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3673-3684. doi: 10.1109/TCBB.2021.3131203. Epub 2022 Dec 8.
6
Jumper enables discontinuous transcript assembly in coronaviruses.跳跃基因使冠状病毒的不连续转录组装成为可能。
Nat Commun. 2021 Nov 18;12(1):6728. doi: 10.1038/s41467-021-26944-y.
7
Exact transcript quantification over splice graphs.通过剪接图进行精确的转录本定量分析。
Algorithms Mol Biol. 2021 May 10;16(1):5. doi: 10.1186/s13015-021-00184-7.
8
V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data.V-pipe:一种用于从高通量数据评估病毒遗传多样性的计算流程。
Bioinformatics. 2021 Jul 19;37(12):1673-1680. doi: 10.1093/bioinformatics/btab015.
9
Opportunities and challenges in long-read sequencing data analysis.长读测序数据分析中的机遇与挑战。
Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.
10
Transcriptome assembly from long-read RNA-seq alignments with StringTie2.基于长读 RNA-seq 比对的转录组组装与 StringTie2。
Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1.