Suppr超能文献

常树宽泛基因组图上的最大得分路径集

Maximum-scoring path sets on pangenome graphs of constant treewidth.

作者信息

Brejová Broňa, Gagie Travis, Herencsárová Eva, Vinař Tomáš

机构信息

Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia.

Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada.

出版信息

Front Bioinform. 2024 Jul 1;4:1391086. doi: 10.3389/fbinf.2024.1391086. eCollection 2024.

Abstract

We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph and a non-negative startup penalty , we can find a set of vertex-disjoint paths in with maximum total score when each path's score is its vertices' total weight minus . We call this new problem (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.

摘要

我们将Csűrös(《IEEE/ACM计算生物学与生物信息学汇刊》,2004年,第1卷,第139 - 150页)之前研究的寻找最大得分片段集的问题从序列推广到了图。具体而言,给定一个顶点加权图和一个非负的起始惩罚值,当每条路径的得分是其顶点的总权重减去该惩罚值时,我们可以在该图中找到一组顶点不相交的路径,使其总得分最大。我们将这个新问题称为(MSPS)。我们提出了一种算法,对于具有常数树宽的图,该算法具有线性时间复杂度。从序列到图的推广使得该算法可用于表示多个相关基因组的泛基因组图,并且可以被视为泛基因组上几个生物学问题的通用抽象,包括搜索CpG岛、ChIP - seq数据分析、功能元件区域富集分析或简单的连锁问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455f/11246863/5837a62084f5/fbinf-04-1391086-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验