Suppr超能文献

通过整数线性规划对流量分解问题进行安全框架构建。

A safety framework for flow decomposition problems via integer linear programming.

机构信息

Department of Computer Science, University of Helsinki, Helsinki 00560, Finland.

School of Computing, Montana State University, Bozeman, MT 59717, United States.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.

Abstract

MOTIVATION

Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding "safe" partial solutions (e.g. contigs) which are common to all solutions. Previous research on safety has focused on polynomially time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of "safety tools" for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, "minimum flow decomposition" (MFD). We obtain our results by developing a "safety test" for paths based on a general integer linear programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure.

RESULTS

Experimental results on transcriptome datasets show that all safe paths for MFDs correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27 000 non-trivial graphs of this dataset in only 1.5 h. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem.

AVAILABILITY AND IMPLEMENTATION

https://github.com/algbio/mfd-safety.

摘要

动机

生物信息学中的许多重要问题(例如组装或多组装)都有多个解决方案,而最终的目标是只报告一个解决方案。处理这种不确定性的一种常见方法是找到所有解决方案共有的“安全”部分解决方案(例如,连续体)。以前对安全性的研究集中在多项式时间可解的问题上,而许多成功的自然模型难以解决 NP 难题,导致缺乏此类问题的“安全工具”。我们提出了一种针对 NP 难题“最小流分解”(MFD)的计算所有安全解决方案的方法。我们通过开发一种基于通用整数线性规划(ILP)公式的路径“安全测试”来获得结果。此外,我们还提供了具有实际优化的实现,旨在减少总 ILP 时间,其中最有效的实现是基于递归分组测试过程。

结果

在转录组数据集上的实验结果表明,MFD 的所有安全路径正确地恢复了高达 90%的完整 RNA 转录本,比以前已知的安全路径至少多 25%。此外,尽管该问题具有 NP 难度,但我们可以在仅 1.5 小时内报告该数据集超过 27,000 个非平凡图的 99.8%的所有安全路径。我们的结果表明,在完美的数据中,RNA 组装问题的歧义性比想象的要小。

可用性和实现

https://github.com/algbio/mfd-safety。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验