• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GIN-TONIC:用于图基因组的非分层全文索引

GIN-TONIC: non-hierarchical full-text indexing for graph genomes.

作者信息

Öztürk Ünsal, Mattavelli Marco, Ribeca Paolo

机构信息

SCI-STI-MM, EPFL, ELB 118, Station 11, 1015, Lausanne, Switzerland.

Biomathematics and Statistics Scotland, The James Hutton Institute, Peter Guthrie Tait Road, EH9 3FD, Edinburgh, United Kingdom.

出版信息

NAR Genom Bioinform. 2024 Dec 11;6(4):lqae159. doi: 10.1093/nargab/lqae159. eCollection 2024 Dec.

DOI:10.1093/nargab/lqae159
PMID:39664816
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11632618/
Abstract

This paper presents a new data structure, GIN-TONIC (raph dexing hrough ptimal ear nterval ompaction), designed to index arbitrary string-labelled directed graphs representing, for instance, pangenomes or transcriptomes. GIN-TONIC provides several capabilities not offered by other graph-indexing methods based on the FM-Index. It is non-hierarchical, handling a graph as a monolithic object; it indexes at nucleotide resolution all possible walks in the graph without the need to explicitly store them; it supports exact substring queries in polynomial time and space for all possible walk roots in the graph, even if there are exponentially many walks corresponding to such roots. Specific ad-hoc optimizations, such as precomputed caches, allow GIN-TONIC to achieve excellent performance for input graphs of various topologies and sizes. Robust scalability capabilities and a querying performance close to that of a linear FM-Index are demonstrated for two real-world applications on the scale of human pangenomes and transcriptomes. Source code and associated benchmarks are available on GitHub.

摘要

本文提出了一种新的数据结构GIN-TONIC(通过最优耳区间压缩进行图索引),旨在对任意字符串标记的有向图进行索引,例如代表泛基因组或转录组的图。GIN-TONIC提供了基于FM索引的其他图索引方法所没有的几种功能。它是非分层的,将图作为一个整体对象来处理;它以核苷酸分辨率对图中所有可能的路径进行索引,而无需显式存储它们;它支持在多项式时间和空间内对图中所有可能的路径根进行精确子串查询,即使对应于这些根的路径数量呈指数级增长。特定的临时优化,如预计算缓存,使GIN-TONIC能够在各种拓扑结构和大小的输入图上实现出色的性能。针对人类泛基因组和转录组规模的两个实际应用,展示了强大的可扩展性能力以及接近线性FM索引的查询性能。源代码和相关基准测试可在GitHub上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/b8bc28977ce7/lqae159fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/307684d442ad/lqae159fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/6a01839bce87/lqae159fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/b8bc28977ce7/lqae159fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/307684d442ad/lqae159fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/6a01839bce87/lqae159fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d028/11632618/b8bc28977ce7/lqae159fig3.jpg

相似文献

1
GIN-TONIC: non-hierarchical full-text indexing for graph genomes.GIN-TONIC:用于图基因组的非分层全文索引
NAR Genom Bioinform. 2024 Dec 11;6(4):lqae159. doi: 10.1093/nargab/lqae159. eCollection 2024 Dec.
2
MEM-based pangenome indexing for -mer queries.基于MEM的用于k-mer查询的泛基因组索引
bioRxiv. 2024 May 22:2024.05.20.595044. doi: 10.1101/2024.05.20.595044.
3
Mem-based pangenome indexing for k-mer queries.用于k-mer查询的基于内存的泛基因组索引
Algorithms Mol Biol. 2025 Mar 1;20(1):3. doi: 10.1186/s13015-025-00272-y.
4
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
5
VariantStore: an index for large-scale genomic variant search.变体存储:用于大规模基因组变体搜索的索引。
Genome Biol. 2021 Aug 19;22(1):231. doi: 10.1186/s13059-021-02442-8.
6
Prokrustean Graph: A substring index for rapid k-mer size analysis.普罗克汝斯忒斯图:一种用于快速k-mer大小分析的子串索引。
bioRxiv. 2024 Dec 20:2023.11.21.568151. doi: 10.1101/2023.11.21.568151.
7
Indexing Graphs for Path Queries with Applications in Genome Research.用于路径查询的图索引及其在基因组研究中的应用
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):375-88. doi: 10.1109/TCBB.2013.2297101.
8
A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly.一种内存效率高的数据结构,用于表示精确匹配的重叠图,适用于下一代 DNA 组装。
Bioinformatics. 2011 Jul 15;27(14):1901-7. doi: 10.1093/bioinformatics/btr321. Epub 2011 Jun 2.
9
GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases.GRAPES-DD:利用决策图进行生物图谱数据库中的索引驱动搜索。
BMC Bioinformatics. 2021 Apr 22;22(1):209. doi: 10.1186/s12859-021-04129-0.
10
Compression algorithm for colored de Bruijn graphs.彩色德布鲁因图的压缩算法。
Algorithms Mol Biol. 2024 May 26;19(1):20. doi: 10.1186/s13015-024-00254-6.

本文引用的文献

1
WGT: Tools and algorithms for recognizing, visualizing, and generating Wheeler graphs.WGT:用于识别、可视化和生成惠勒图的工具与算法。
iScience. 2023 Jul 14;26(8):107402. doi: 10.1016/j.isci.2023.107402. eCollection 2023 Aug 18.
2
A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。
Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.
3
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
4
Pangenome graph construction from genome alignments with Minigraph-Cactus.基于 Minigraph-Cactus 的基因组比对构建泛基因组图谱。
Nat Biotechnol. 2024 Apr;42(4):663-673. doi: 10.1038/s41587-023-01793-w. Epub 2023 May 10.
5
GBZ file format for pangenome graphs.GBZ 文件格式用于泛基因组图谱。
Bioinformatics. 2022 Nov 15;38(22):5012-5018. doi: 10.1093/bioinformatics/btac656.
6
The Human Pangenome Project: a global resource to map genomic diversity.人类泛基因组计划:绘制基因组多样性图谱的全球资源。
Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.
7
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.泛基因组学能够对 5202 个不同基因组中的已知结构变异进行基因分型。
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.
8
Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N.使用HISAT-3N对核苷酸转换测序读数进行快速准确的比对。
Genome Res. 2021 Jul;31(7):1290-1295. doi: 10.1101/gr.275193.120. Epub 2021 Jun 8.
9
GENCODE 2021.GENCODE 2021.
Nucleic Acids Res. 2021 Jan 8;49(D1):D916-D923. doi: 10.1093/nar/gkaa1087.
10
The design and construction of reference pangenome graphs with minigraph.使用 Minigraph 设计和构建参考泛基因组图谱。
Genome Biol. 2020 Oct 16;21(1):265. doi: 10.1186/s13059-020-02168-z.