Suppr超能文献

通过流分解中的安全性和完整性提高 RNA 组装

Improving RNA Assembly via Safety and Completeness in Flow Decompositions.

机构信息

Department of Computer Science and Engineering, IIT Roorkee, Roorkee, India.

Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25.

Abstract

Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic . On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ( more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by ) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time () and space performance (), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.

摘要

将网络流量分解为加权路径是一个具有广泛应用的问题,包括网络、交通规划和生物信息学等领域。在某些应用中,我们寻求一种最优的分解,这种分解可以是基于某个属性的,例如使用的路径数量、对边删除的鲁棒性或最长路径的长度。然而,在许多生物信息学应用中,我们寻求的是一种特定的分解,其中路径对应于生成流量的某些基础数据。在这些情况下,没有优化标准可以保证正确的分解识别。因此,我们建议报告路径,这些路径是每个流量分解中至少一条路径的子路径。在这项工作中,我们首次对有向无环图中的流量分解的安全路径进行了特征化,从而提出了一种用于找到安全路径集合的实用算法。此外,我们还在 RNA 转录数据集上评估了我们的算法,与一种简单的安全算法(扩展单元)、最近提出的路径覆盖的安全路径(TCBB 2021)和流行的启发式算法进行了比较。一方面,我们发现除了保持完美的精度外,我们的安全完整算法比其他安全算法报告了更高的覆盖率(高 )。另一方面,贪婪宽度算法虽然报告了更好的覆盖率,但在复杂图上的精度也较低(对于表达大量转录本的基因)。总的来说,在考虑覆盖率和精度的统一度量(F 分数)上,我们的安全完整算法在具有大量复杂图的数据集上优于(高 )贪婪宽度算法。此外,它还具有优越的时间()和空间性能(),为流量分解的生物信息学应用提供了更好、更实用的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f96e/9807076/5cadeb9da51d/cmb.2022.0261_figure1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验