Suppr超能文献

成对图编辑距离表征了构建方法对泛基因组图的影响。

Pairwise graph edit distance characterizes the impact of the construction method on pangenome graphs.

作者信息

Dubois Siegfried, Zytnicki Matthias, Lemaitre Claire, Faraut Thomas

机构信息

Univ Rennes, CNRS, Inria, IRISA-UMR 6074, F-35000, Rennes, France.

GenPhySE, Université de Toulouse, INRAE, ENVT, 31320, Castanet-Tolosan, France.

出版信息

Bioinformatics. 2025 May 9. doi: 10.1093/bioinformatics/btaf291.

Abstract

MOTIVATION

Pangenome variation graphs are an increasingly used tool to perform genome analysis, aiming to replace a linear reference in a wide variety of genomic analyses. The construction of a variation graph from a collection of chromosome-size genome sequences is a difficult task that is generally addressed using a number of heuristics. The question that arises is to what extent the construction method influences the resulting graph, and the characterization of variability.

RESULTS

We aim to characterize the differences between variation graphs derived from the same set of genomes with a metric which expresses and pinpoint differences. We designed a pairwise variation graph comparison algorithm, which establishes an edit distance between variation graphs, threading the genomes through both graphs. We applied our method to pangenome graphs built from yeast and human chromosome collections, and demonstrate that our method effectively characterizes discordances between pangenome graph construction methods and scales to real datasets.

AVAILABILITY

pancat compare is published as free Rust software under the AGPL3.0 open source license. Source code and documentation are available at https://github.com/dubssieg/rs-pancat-compare. Snapshot available on Software Heritage at swh:1:dir:61acda8ba3dac1709ed60530147d3871831be629.

SUPPLEMENTARY INFORMATION

Supplementary data are available online at https://doi.org/10.5281/zenodo.10932489. Code to replicate figures and analysis is available online at https://github.com/dubssieg/pancat_paper.

摘要

动机

泛基因组变异图是一种越来越常用的基因组分析工具,旨在在各种基因组分析中取代线性参考。从一组染色体大小的基因组序列构建变异图是一项艰巨的任务,通常使用多种启发式方法来解决。由此产生的问题是构建方法在多大程度上影响最终的图以及变异的特征。

结果

我们旨在用一种能够表达并精确指出差异的度量来表征源自同一组基因组的变异图之间的差异。我们设计了一种成对变异图比较算法,该算法通过在两个图中贯穿基因组来建立变异图之间的编辑距离。我们将我们的方法应用于由酵母和人类染色体集合构建的泛基因组图,并证明我们的方法有效地表征了泛基因组图构建方法之间的不一致性,并且能够扩展到实际数据集。

可用性

pancat compare作为免费的Rust软件,根据AGPL3.0开源许可发布。源代码和文档可在https://github.com/dubssieg/rs-pancat-compare获取。软件遗产(Software Heritage)上的快照可在swh:1:dir:61acda8ba3dac1709ed60530147d3871831be629获取。

补充信息

补充数据可在https://doi.org/10.5281/zenodo.10932489在线获取。用于复制图表和分析的代码可在https://github.com/dubssieg/pancat_paper在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验