Suppr超能文献

PangeBlocks:通过最大块实现泛基因组图的定制构建。

PangeBlocks: customized construction of pangenome graphs via maximal blocks.

机构信息

Department of Informatics, Systems, and Communications, University of Milano - Bicocca, Viale Sarca, 20126, Milano, Italy.

Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Mlynská dolina F1, Bratislava, 84248, Slovakia.

出版信息

BMC Bioinformatics. 2024 Nov 4;25(1):344. doi: 10.1186/s12859-024-05958-5.

Abstract

BACKGROUND

The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling.

RESULTS

In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph. We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase.

CONCLUSION

We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs. In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.

摘要

背景

泛基因组图的构建是泛基因组学的基本任务。一个自然的理论问题是如何形式化构建最优泛基因组图的计算问题,明确优化标准和可行解集。当前的方法使用一些启发式算法构建泛基因组图,而没有假设一些显式的优化标准。因此,不清楚特定的优化标准如何影响图拓扑结构和下游分析,如读取映射和变异调用。

结果

在本文中,通过利用多重序列比对(MSA)中最大块的概念,我们将泛基因组图构建问题重新表述为块上的精确覆盖问题,称为最小加权块覆盖(MWBC)。然后,我们提出了一个用于 MWBC 问题的整数线性规划(ILP)公式,允许我们研究构建图的最自然的目标函数。我们提供了一种用于解决 MWBC 的 ILP 方法的实现,并在 SARS-CoV-2 完整基因组上进行了评估,展示了不同的目标函数如何导致具有不同性质的泛基因组图,暗示特定的下游任务可以驱动图构建阶段。

结论

我们表明,基于选择目标函数定制构建泛基因组图会直接影响生成的图。特别是,我们基于找到最佳块子集来覆盖 MSA 的 MWBC 问题的形式化,为 MSA 的图表示开辟了新的实用方法,用户可以在其中指导构建。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbc8/11533710/a1bc65709f66/12859_2024_5958_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验