Suppr超能文献

高效动态变化图。

Efficient dynamic variation graphs.

机构信息

Genomics Institute, Santa Cruz, CA 95064, USA.

Biomolecular Engineering and Bioinformatics, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

出版信息

Bioinformatics. 2021 Jan 29;36(21):5139-5144. doi: 10.1093/bioinformatics/btaa640.

Abstract

MOTIVATION

Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this data model can be difficult, and the scale of pangenomic datasets can be challenging to work at. These challenges have impeded progress in this field.

RESULTS

Here, we present a stack of two C++ libraries, libbdsg and libhandlegraph, which use a simple, field-proven interface, designed to expose elementary features of these graphs while preventing common graph manipulation mistakes. The libraries also provide a Python binding. Using a diverse collection of pangenome graphs, we demonstrate that these tools allow for efficient construction and manipulation of large genome graphs with dense variation. For instance, the speed and memory usage are up to an order of magnitude better than the prior graph implementation in the VG toolkit, which has now transitioned to using libbdsg's implementations.

AVAILABILITY AND IMPLEMENTATION

libhandlegraph and libbdsg are available under an MIT License from https://github.com/vgteam/libhandlegraph and https://github.com/vgteam/libbdsg.

摘要

动机

泛基因组学是计算基因组学领域中一个不断发展的领域。许多泛基因组分析使用有向序列图作为其核心数据模型。然而,实现和正确使用这个数据模型可能很困难,而且泛基因组数据集的规模也很难处理。这些挑战阻碍了该领域的进展。

结果

在这里,我们提出了一个由两个 C++ 库组成的堆栈,libbdsg 和 libhandlegraph,它们使用简单、经过现场验证的接口,旨在暴露这些图的基本特征,同时防止常见的图操作错误。这些库还提供了一个 Python 绑定。使用各种泛基因组图谱,我们证明这些工具允许高效构建和操作具有密集变化的大型基因组图谱。例如,速度和内存使用量比 VG 工具包中以前的图实现要好一个数量级,VG 工具包现在已经过渡到使用 libbdsg 的实现。

可用性和实现

libhandlegraph 和 libbdsg 可在 MIT 许可证下从 https://github.com/vgteam/libhandlegraphhttps://github.com/vgteam/libbdsg 获得。

相似文献

1
Efficient dynamic variation graphs.高效动态变化图。
Bioinformatics. 2021 Jan 29;36(21):5139-5144. doi: 10.1093/bioinformatics/btaa640.
2
ODGI: understanding pangenome graphs.ODGI:理解泛基因组图谱。
Bioinformatics. 2022 Jun 27;38(13):3319-3326. doi: 10.1093/bioinformatics/btac308.
3
Unbiased pangenome graphs.无偏泛基因组图。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac743.
4
Distance indexing and seed clustering in sequence graphs.序列图中的距离索引和种子聚类。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i146-i153. doi: 10.1093/bioinformatics/btaa446.
5
Haplotype-aware graph indexes.单体型感知图索引。
Bioinformatics. 2020 Jan 15;36(2):400-407. doi: 10.1093/bioinformatics/btz575.
7
GBZ file format for pangenome graphs.GBZ 文件格式用于泛基因组图谱。
Bioinformatics. 2022 Nov 15;38(22):5012-5018. doi: 10.1093/bioinformatics/btac656.
9
A Sequence Distance Graph framework for genome assembly and analysis.用于基因组组装和分析的序列距离图框架。
F1000Res. 2019 Aug 23;8:1490. doi: 10.12688/f1000research.20233.1. eCollection 2019.

引用本文的文献

3
Pangenome comparison via ED strings.通过编辑距离字符串进行泛基因组比较。
Front Bioinform. 2024 Sep 26;4:1397036. doi: 10.3389/fbinf.2024.1397036. eCollection 2024.
8
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验