• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在德布鲁因图上进行读段映射。

Read mapping on de Bruijn graphs.

作者信息

Limasset Antoine, Cazaux Bastien, Rivals Eric, Peterlongo Pierre

机构信息

IRISA Inria Rennes Bretagne Atlantique, GenScale team, Campus de Beaulieu, Rennes, 35042, France.

L.I.R.M.M., UMR 5506, Université de Montpellier et CNRS, 860 rue de St Priest, Montpellier Cedex 5, F-34392, France.

出版信息

BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.

DOI:10.1186/s12859-016-1103-9
PMID:27306641
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4910249/
Abstract

BACKGROUND

Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig describes the sequence found along some path of the assembly graph, however, the set of contigs does not record all the sequence information contained in that graph. Although many subsequent analyses can be performed with the set of contigs, one may ask whether mapping reads on the contigs is as informative as mapping them on the paths of the assembly graph. Currently, one lacks practical tools to perform mapping on such graphs.

RESULTS

Here, we propose a formal definition of mapping on a de Bruijn graph, analyse the problem complexity which turns out to be NP-complete, and provide a practical solution. We propose a pipeline called GGMAP (Greedy Graph MAPping). Its novelty is a procedure to map reads on branching paths of the graph, for which we designed a heuristic algorithm called BGREAT (de Bruijn Graph REAd mapping Tool). For the sake of efficiency, BGREAT rewrites a read sequence as a succession of unitigs sequences. GGMAP can map millions of reads per CPU hour on a de Bruijn graph built from a large set of human genomic reads. Surprisingly, results show that up to 22 % more reads can be mapped on the graph but not on the contig set.

CONCLUSIONS

Although mapping reads on a de Bruijn graph is complex task, our proposal offers a practical solution combining efficiency with an improved mapping capacity compared to assembly-based mapping even for complex eukaryotic data.

摘要

背景

新一代测序(NGS)极大地提升了我们对基因组进行测序的能力,但在基因组组装方面却并非如此。实际上,许多已发表的基因组序列仍处于大量重叠群的状态。每个重叠群描述了沿着组装图的某些路径所发现的序列,然而,重叠群集合并未记录该图中包含的所有序列信息。尽管可以使用重叠群集合进行许多后续分析,但有人可能会问,将 reads 映射到重叠群上是否与将它们映射到组装图的路径上一样具有信息量。目前,缺乏在这样的图上进行映射的实用工具。

结果

在此,我们提出了在德布鲁因图上进行映射的形式化定义,分析了结果证明是 NP 完全问题的复杂度,并提供了一个实际解决方案。我们提出了一个名为 GGMAP(贪婪图映射)的流程。它的新颖之处在于一种将 reads 映射到图的分支路径上的程序,为此我们设计了一种名为 BGREAT(德布鲁因图 reads 映射工具)的启发式算法。为了提高效率,BGREAT 将一个 reads 序列重写为一系列单重叠群序列。GGMAP 每 CPU 小时可以在由大量人类基因组 reads 构建的德布鲁因图上映射数百万个 reads。令人惊讶的是,结果表明,在图上可以映射的 reads 比在重叠群集合上多 22%。

结论

尽管将 reads 映射到德布鲁因图上是一项复杂的任务,但我们的提议提供了一个实际解决方案,与基于组装的映射相比,即使对于复杂的真核生物数据,也能将效率与改进的映射能力相结合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/fbcf0eb44290/12859_2016_1103_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/2586ed66582d/12859_2016_1103_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/a6d994503bf8/12859_2016_1103_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/c15ad9f8ce1c/12859_2016_1103_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/0d6a3cb84da9/12859_2016_1103_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/fbcf0eb44290/12859_2016_1103_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/2586ed66582d/12859_2016_1103_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/a6d994503bf8/12859_2016_1103_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/c15ad9f8ce1c/12859_2016_1103_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/0d6a3cb84da9/12859_2016_1103_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a886/4910249/fbcf0eb44290/12859_2016_1103_Fig5_HTML.jpg

相似文献

1
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.
2
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
3
Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.配对德布鲁因图:一种将配对末端信息整合到基因组组装工具中的新方法。
J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14.
4
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
5
TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix.TraRECo:一种基于贪心策略的从头转录组组装方法,使用一致矩阵进行读错误校正。
BMC Genomics. 2018 Sep 4;19(1):653. doi: 10.1186/s12864-018-5034-x.
6
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.
7
Safe and Complete Contig Assembly Through Omnitigs.通过全基因组重叠群实现安全且完整的重叠群组装。
J Comput Biol. 2017 Jun;24(6):590-602. doi: 10.1089/cmb.2016.0141. Epub 2016 Oct 17.
8
Evaluation of short read metagenomic assembly.短读宏基因组组装评估。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.
9
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
10
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.

引用本文的文献

1
Plant graph-based pangenomics: techniques, applications, and challenges.基于植物图谱的泛基因组学:技术、应用与挑战。
aBIOTECH. 2025 Mar 28;6(2):361-376. doi: 10.1007/s42994-025-00206-7. eCollection 2025 Jun.
2
A survey of sequence-to-graph mapping algorithms in the pangenome era.泛基因组时代序列到图谱映射算法综述。
Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.
3
Label-guided seed-chain-extend alignment on annotated De Bruijn graphs.带标签的种子链扩展对齐标注的 De Bruijn 图。

本文引用的文献

1
Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis.利用宏基因组组装的德布鲁因图进行宏转录组分析。
Bioinformatics. 2016 Apr 1;32(7):1001-8. doi: 10.1093/bioinformatics/btv510. Epub 2015 Aug 29.
2
Improved genome inference in the MHC using a population reference graph.利用群体参考图改进主要组织相容性复合体(MHC)中的基因组推断。
Nat Genet. 2015 Jun;47(6):682-8. doi: 10.1038/ng.3257. Epub 2015 Apr 27.
3
On the representation of de Bruijn graphs.关于德布鲁因图的表示。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i337-i346. doi: 10.1093/bioinformatics/btae226.
4
The complete diploid reference genome of RPE-1 identifies human phased epigenetic landscapes.RPE-1细胞的完整二倍体参考基因组确定了人类的相位表观遗传景观。
bioRxiv. 2023 Dec 30:2023.11.01.565049. doi: 10.1101/2023.11.01.565049.
5
Pan-genome de Bruijn graph using the bidirectional FM-index.基于双向 FM-index 的泛基因组 de Bruijn 图
BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.
6
DBTRG: De Bruijn Trim rotation graph encoding for reliable DNA storage.DBTRG:用于可靠DNA存储的德布鲁因修剪旋转图编码
Comput Struct Biotechnol J. 2023 Sep 11;21:4469-4477. doi: 10.1016/j.csbj.2023.09.004. eCollection 2023.
7
Aligning distant sequences to graphs using long seed sketches.使用长种子草图对齐图上的远距离序列。
Genome Res. 2023 Jul;33(7):1208-1217. doi: 10.1101/gr.277659.123. Epub 2023 Apr 18.
8
From the reference human genome to human pangenome: Premise, promise and challenge.从参考人类基因组到人类泛基因组:前提、前景与挑战。
Front Genet. 2022 Nov 10;13:1042550. doi: 10.3389/fgene.2022.1042550. eCollection 2022.
9
The Human Pangenome Project: a global resource to map genomic diversity.人类泛基因组计划:绘制基因组多样性图谱的全球资源。
Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.
10
Chromosome-level genome assembly reveals homologous chromosomes and recombination in asexual rotifer .染色体水平的基因组组装揭示了无性轮虫中的同源染色体和重组。
Sci Adv. 2021 Oct 8;7(41):eabg4216. doi: 10.1126/sciadv.abg4216. Epub 2021 Oct 6.
J Comput Biol. 2015 May;22(5):336-52. doi: 10.1089/cmb.2014.0160. Epub 2015 Jan 28.
4
LoRDEC: accurate and efficient long read error correction.LoRDEC:准确高效的长读错误纠正。
Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.
5
Space-efficient and exact de Bruijn graph representation based on a Bloom filter.基于布隆过滤器的空间高效且精确的德布鲁因图表示。
Algorithms Mol Biol. 2013 Sep 16;8(1):22. doi: 10.1186/1748-7188-8-22.
6
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.Assemblathon2:在三个脊椎动物物种中评估从头组装基因组方法。
Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.
7
Short read alignment with populations of genomes.短读序列比对与基因组群体。
Bioinformatics. 2013 Jul 1;29(13):i361-70. doi: 10.1093/bioinformatics/btt215.
8
Finished bacterial genomes from shotgun sequence data.已完成的来自鸟枪法测序数据的细菌基因组。
Genome Res. 2012 Nov;22(11):2270-7. doi: 10.1101/gr.141515.112. Epub 2012 Jul 24.
9
A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community.一种用于量化微生物群落中密切相关基因组的德布鲁因图方法。
J Comput Biol. 2012 Jun;19(6):814-25. doi: 10.1089/cmb.2012.0058.
10
Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.基因组暗物质:基因组可映射分数所说明的短读映射可靠性。
Bioinformatics. 2012 Aug 15;28(16):2097-105. doi: 10.1093/bioinformatics/bts330. Epub 2012 Jun 4.