Brejová Broňa, Gagie Travis, Herencsárová Eva, Vinař Tomáš
Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia.
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada.
Front Bioinform. 2024 Jul 1;4:1391086. doi: 10.3389/fbinf.2024.1391086. eCollection 2024.
We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph and a non-negative startup penalty , we can find a set of vertex-disjoint paths in with maximum total score when each path's score is its vertices' total weight minus . We call this new problem (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.
我们将Csűrös(《IEEE/ACM计算生物学与生物信息学汇刊》,2004年,第1卷,第139 - 150页)之前研究的寻找最大得分片段集的问题从序列推广到了图。具体而言,给定一个顶点加权图和一个非负的起始惩罚值,当每条路径的得分是其顶点的总权重减去该惩罚值时,我们可以在该图中找到一组顶点不相交的路径,使其总得分最大。我们将这个新问题称为(MSPS)。我们提出了一种算法,对于具有常数树宽的图,该算法具有线性时间复杂度。从序列到图的推广使得该算法可用于表示多个相关基因组的泛基因组图,并且可以被视为泛基因组上几个生物学问题的通用抽象,包括搜索CpG岛、ChIP - seq数据分析、功能元件区域富集分析或简单的连锁问题。