通过流分解中的安全性和完整性提高 RNA 组装

Improving RNA Assembly via Safety and Completeness in Flow Decompositions.

机构信息

Department of Computer Science and Engineering, IIT Roorkee, Roorkee, India.

Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.

DOI:10.1089/cmb.2022.0261

PMID:36288562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9807076/

Abstract

Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic . On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ( more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by ) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time () and space performance (), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.

摘要

将网络流量分解为加权路径是一个具有广泛应用的问题，包括网络、交通规划和生物信息学等领域。在某些应用中，我们寻求一种最优的分解，这种分解可以是基于某个属性的，例如使用的路径数量、对边删除的鲁棒性或最长路径的长度。然而，在许多生物信息学应用中，我们寻求的是一种特定的分解，其中路径对应于生成流量的某些基础数据。在这些情况下，没有优化标准可以保证正确的分解识别。因此，我们建议报告路径，这些路径是每个流量分解中至少一条路径的子路径。在这项工作中，我们首次对有向无环图中的流量分解的安全路径进行了特征化，从而提出了一种用于找到安全路径集合的实用算法。此外，我们还在 RNA 转录数据集上评估了我们的算法，与一种简单的安全算法（扩展单元）、最近提出的路径覆盖的安全路径（TCBB 2021）和流行的启发式算法进行了比较。一方面，我们发现除了保持完美的精度外，我们的安全完整算法比其他安全算法报告了更高的覆盖率（高）。另一方面，贪婪宽度算法虽然报告了更好的覆盖率，但在复杂图上的精度也较低（对于表达大量转录本的基因）。总的来说，在考虑覆盖率和精度的统一度量（F 分数）上，我们的安全完整算法在具有大量复杂图的数据集上优于（高）贪婪宽度算法。此外，它还具有优越的时间（）和空间性能（），为流量分解的生物信息学应用提供了更好、更实用的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f96e/9807076/5cadeb9da51d/cmb.2022.0261_figure1.jpg

相似文献

Improving RNA Assembly via Safety and Completeness in Flow Decompositions.通过流分解中的安全性和完整性提高 RNA 组装

J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.

Flow Decomposition With Subpath Constraints.具有子路径约束的流分解

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):360-370. doi: 10.1109/TCBB.2022.3147697. Epub 2023 Feb 3.

Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.通过有向无环图（DAG）的所有路径覆盖中出现的路径实现多组件中的安全性。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3673-3684. doi: 10.1109/TCBB.2021.3131203. Epub 2022 Dec 8.

Theory and A Heuristic for the Minimum Path Flow Decomposition Problem.最小路径流分解问题的理论与启发式方法。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):658-670. doi: 10.1109/TCBB.2017.2779509. Epub 2017 Dec 4.

Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data.用于促进从RNA测序数据中同时发现相关生物体中转录本的de Bruijn图启发式成对比对。

BMC Genomics. 2015;16 Suppl 11(Suppl 11):S5. doi: 10.1186/1471-2164-16-S11-S5. Epub 2015 Nov 10.

A safety framework for flow decomposition problems via integer linear programming.通过整数线性规划对流量分解问题进行安全框架构建。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.

Efficient Minimum Flow Decomposition via Integer Linear Programming.通过整数线性规划实现有效的最小流量分解。

J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.

The Edge-Disjoint Path Problem on Random Graphs by Message-Passing.基于消息传递的随机图上的边不相交路径问题

PLoS One. 2015 Dec 28;10(12):e0145222. doi: 10.1371/journal.pone.0145222. eCollection 2015.

Computing paths and cycles in biological interaction graphs.计算生物相互作用图中的路径和循环。

BMC Bioinformatics. 2009 Jun 15;10:181. doi: 10.1186/1471-2105-10-181.

Algorithm for shortest path search in Geographic Information Systems by using reduced graphs.利用简化图在地理信息系统中进行最短路径搜索的算法

Springerplus. 2013 Jul 1;2:291. doi: 10.1186/2193-1801-2-291. eCollection 2013.

引用本文的文献

Accurate assembly of multiple RNA-seq samples with Aletsch.利用 Aletsch 对多个 RNA-seq 样本进行精确组装。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i307-i317. doi: 10.1093/bioinformatics/btae215.

A safety framework for flow decomposition problems via integer linear programming.通过整数线性规划对流量分解问题进行安全框架构建。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad640.

Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD.使用 EMERALD 从多样化的蛋白质序列簇中进行对齐安全区间的敏感推断。

Genome Biol. 2023 Jul 17;24(1):168. doi: 10.1186/s13059-023-03008-6.

本文引用的文献

Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability.由于不可识别性导致的最优转录本表达范围的推导。

J Comput Biol. 2022 Feb;29(2):121-139. doi: 10.1089/cmb.2021.0444. Epub 2022 Jan 17.

Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.通过有向无环图（DAG）的所有路径覆盖中出现的路径实现多组件中的安全性。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3673-3684. doi: 10.1109/TCBB.2021.3131203. Epub 2022 Dec 8.

Haploflow: strain-resolved de novo assembly of viral genomes.Haploflow：病毒基因组的菌株解析从头组装

Genome Biol. 2021 Jul 19;22(1):212. doi: 10.1186/s13059-021-02426-8.

Alignment and mapping methodology influence transcript abundance estimation.比对和映射方法会影响转录本丰度的估计。

Genome Biol. 2020 Sep 7;21(1):239. doi: 10.1186/s13059-020-02151-8.

TransBorrow: genome-guided transcriptome assembly by borrowing assemblies from different assemblers.TransBorrow：通过从不同的组装器借用组装来进行基因组指导的转录组组装。

Genome Res. 2020 Aug;30(8):1181-1190. doi: 10.1101/gr.257766.119. Epub 2020 Aug 17.

Full-length de novo viral quasispecies assembly through variation graph construction.通过变异图构建进行全长从头病毒准种组装。

Bioinformatics. 2019 Dec 15;35(24):5086-5094. doi: 10.1093/bioinformatics/btz443.

Ryūtō: network-flow based transcriptome reconstruction.龙童：基于网络流的转录组重构。

BMC Bioinformatics. 2019 Apr 16;20(1):190. doi: 10.1186/s12859-019-2786-5.

Theory and A Heuristic for the Minimum Path Flow Decomposition Problem.最小路径流分解问题的理论与启发式方法。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):658-670. doi: 10.1109/TCBB.2017.2779509. Epub 2017 Dec 4.

A safe and complete algorithm for metagenomic assembly.

Algorithms Mol Biol. 2018 Feb 7;13:3. doi: 10.1186/s13015-018-0122-7. eCollection 2018.

Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.草莓：基于RNA测序的快速且准确的基因组引导转录本重建与定量分析

PLoS Comput Biol. 2017 Nov 27;13(11):e1005851. doi: 10.1371/journal.pcbi.1005851. eCollection 2017 Nov.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过流分解中的安全性和完整性提高 RNA 组装

Improving RNA Assembly via Safety and Completeness in Flow Decompositions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献