BrownieAligner：Illumina 测序数据到 de Bruijn 图的精确比对。

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.

机构信息

Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.

Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.

出版信息

BMC Bioinformatics. 2018 Sep 4;19(1):311. doi: 10.1186/s12859-018-2319-7.

DOI:10.1186/s12859-018-2319-7

PMID:30180801

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6122196/

Abstract

BACKGROUND

Aligning short reads to a reference genome is an important task in many genome analysis pipelines. This task is computationally more complex when the reference genome is provided in the form of a de Bruijn graph instead of a linear sequence string.

RESULTS

We present a branch and bound alignment algorithm that uses the seed-and-extend paradigm to accurately align short Illumina reads to a graph. Given a seed, the algorithm greedily explores all branches of the tree until the optimal alignment path is found. To reduce the search space we compute upper bounds to the alignment score for each branch and discard the branch if it cannot improve the best solution found so far. Additionally, by using a two-pass alignment strategy and a higher-order Markov model, paths in the de Bruijn graph that do not represent a subsequence in the original reference genome are discarded from the search procedure.

CONCLUSIONS

BrownieAligner is applied to both synthetic and real datasets. It generally outperforms other state-of-the-art tools in terms of accuracy, while having similar runtime and memory requirements. Our results show that using the higher-order Markov model in BrownieAligner improves the accuracy, while the branch and bound algorithm reduces runtime. BrownieAligner is written in standard C++11 and released under GPL license. BrownieAligner relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at: https://github.com/biointec/browniealigner.

摘要

背景

将短读序列比对到参考基因组是许多基因组分析流程中的重要任务。当参考基因组以 De Bruijn 图而不是线性序列字符串的形式提供时，该任务在计算上更加复杂。

结果

我们提出了一种分支和界限对齐算法，该算法使用种子和扩展范例来准确地将短 Illumina 读取序列比对到图上。给定一个种子，该算法贪婪地探索树的所有分支，直到找到最佳的对齐路径。为了减少搜索空间，我们为每个分支计算对齐得分的上限，并丢弃不能改善迄今为止找到的最佳解决方案的分支。此外，通过使用两阶段对齐策略和高阶马尔可夫模型，从搜索过程中丢弃在 De Bruijn 图中不表示原始参考基因组中的子序列的路径。

结论

BrownieAligner 应用于合成和真实数据集。它在准确性方面通常优于其他最先进的工具，同时具有相似的运行时和内存要求。我们的结果表明，在 BrownieAligner 中使用高阶马尔可夫模型可以提高准确性，而分支和界限算法可以减少运行时间。BrownieAligner 是用标准的 C++11 编写的，并根据 GPL 许可证发布。BrownieAligner 依赖于多线程来利用多核/多 CPU 系统。源代码可在以下网址获得：https://github.com/biointec/browniealigner。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6a8/6122196/f97bafb2cf01/12859_2018_2319_Fig1_HTML.jpg

相似文献

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.BrownieAligner：Illumina 测序数据到 de Bruijn 图的精确比对。

BMC Bioinformatics. 2018 Sep 4;19(1):311. doi: 10.1186/s12859-018-2319-7.

Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。

BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.

deBGA: read alignment with de Bruijn graph-based seed and extension.deBGA：基于 de Bruijn 图的种子和扩展进行读对齐。

Bioinformatics. 2016 Nov 1;32(21):3224-3232. doi: 10.1093/bioinformatics/btw371. Epub 2016 Jul 4.

Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。

BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.

Practical dynamic de Bruijn graphs.实用动态 de Bruijn 图。

Bioinformatics. 2018 Dec 15;34(24):4189-4195. doi: 10.1093/bioinformatics/bty500.

On the representation of de Bruijn graphs.关于德布鲁因图的表示。

J Comput Biol. 2015 May;22(5):336-52. doi: 10.1089/cmb.2014.0160. Epub 2015 Jan 28.

Compacting de Bruijn graphs from sequencing data quickly and in low memory.从测序数据中快速且低内存地压缩德布鲁因图。

Bioinformatics. 2016 Jun 15;32(12):i201-i208. doi: 10.1093/bioinformatics/btw279.

Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields.使用条件随机场准确确定 de bruijn 图中的节点和弧的多重性。

BMC Bioinformatics. 2020 Sep 14;21(1):402. doi: 10.1186/s12859-020-03740-x.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。

Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.

引用本文的文献

Plant graph-based pangenomics: techniques, applications, and challenges.基于植物图谱的泛基因组学：技术、应用与挑战。

aBIOTECH. 2025 Mar 28;6(2):361-376. doi: 10.1007/s42994-025-00206-7. eCollection 2025 Jun.

A survey of sequence-to-graph mapping algorithms in the pangenome era.泛基因组时代序列到图谱映射算法综述。

Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.

Label-guided seed-chain-extend alignment on annotated De Bruijn graphs.带标签的种子链扩展对齐标注的 De Bruijn 图。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i337-i346. doi: 10.1093/bioinformatics/btae226.

Pan-genome de Bruijn graph using the bidirectional FM-index.基于双向 FM-index 的泛基因组 de Bruijn 图

BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.

From the reference human genome to human pangenome: Premise, promise and challenge.从参考人类基因组到人类泛基因组：前提、前景与挑战。

Front Genet. 2022 Nov 10;13:1042550. doi: 10.3389/fgene.2022.1042550. eCollection 2022.

The Human Pangenome Project: a global resource to map genomic diversity.人类泛基因组计划：绘制基因组多样性图谱的全球资源。

Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.

The genome of the extremophile Artemia provides insight into strategies to cope with extreme environments.极端生物卤虫的基因组为我们了解生物应对极端环境的策略提供了线索。

BMC Genomics. 2021 Aug 31;22(1):635. doi: 10.1186/s12864-021-07937-z.

SPAligner: alignment of long diverged molecular sequences to assembly graphs.SPAligner：将长距离分化的分子序列比对到组装图谱上。

BMC Bioinformatics. 2020 Jul 24;21(Suppl 12):306. doi: 10.1186/s12859-020-03590-7.

Pangenome Graphs.泛基因组图谱。

Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162. doi: 10.1146/annurev-genom-120219-080406. Epub 2020 May 26.

Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。

BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.

本文引用的文献

deBGA: read alignment with de Bruijn graph-based seed and extension.deBGA：基于 de Bruijn 图的种子和扩展进行读对齐。

Bioinformatics. 2016 Nov 1;32(21):3224-3232. doi: 10.1093/bioinformatics/btw371. Epub 2016 Jul 4.

Compacting de Bruijn graphs from sequencing data quickly and in low memory.从测序数据中快速且低内存地压缩德布鲁因图。

Bioinformatics. 2016 Jun 15;32(12):i201-i208. doi: 10.1093/bioinformatics/btw279.

Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。

BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.

Detection of Genomic Structural Variants from Next-Generation Sequencing Data.从下一代测序数据中检测基因组结构变异。

Front Bioeng Biotechnol. 2015 Jun 25;3:92. doi: 10.3389/fbioe.2015.00092. eCollection 2015.

ExSPAnder: a universal repeat resolver for DNA fragment assembly.ExSPAnder：一种用于 DNA 片段组装的通用重复序列解析器。

Bioinformatics. 2014 Jun 15;30(12):i293-301. doi: 10.1093/bioinformatics/btu266.

essaMEM: finding maximal exact matches using enhanced sparse suffix arrays.essaMEM：使用增强型稀疏后缀数组查找最大精确匹配。

Bioinformatics. 2013 Mar 15;29(6):802-4. doi: 10.1093/bioinformatics/btt042. Epub 2013 Jan 24.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.基于概率有向图的宏基因组序列组装规模化方法。

Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7. doi: 10.1073/pnas.1121464109. Epub 2012 Jul 30.

Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

ART: a next-generation sequencing read simulator.ART：一种新一代测序读模拟程序。

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

BrownieAligner：Illumina 测序数据到 de Bruijn 图的精确比对。

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献