• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多重 de Bruijn 图可从长的、高保真的读取中进行基因组组装。

Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads.

机构信息

Department of Computer Science and Engineering, University of California, San Diego, San Diego CA, USA.

Program in Bioinformatics and Systems Biology, University of California, San Diego, San Diego CA, USA.

出版信息

Nat Biotechnol. 2022 Jul;40(7):1075-1081. doi: 10.1038/s41587-022-01220-6. Epub 2022 Feb 28.

DOI:10.1038/s41587-022-01220-6
PMID:35228706
Abstract

Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.

摘要

虽然大多数现有的基因组组装器都是基于 de Bruijn 图构建的,但对于大型基因组和大 k-mer 大小的 de Bruijn 图的构建仍然难以实现。随着最近用于生成人类基因组的半手动端粒到端粒组装的长、高保真 (HiFi) 读取的出现,这个算法挑战变得尤为紧迫。为了实现长的 HiFi 读取的自动化组装,我们提出了拉霍亚组装器(LJA),这是一种使用布隆过滤器、稀疏 de Bruijn 图和不相交生成的快速算法。LJA 将 HiFi 读取的错误率降低了三个数量级,为大型基因组和大 k-mer 大小构建了 de Bruijn 图,并将其转换为具有不同 k-mer 大小的多路 de Bruijn 图。与最先进的组装器相比,我们的算法不仅实现了误组装数量减少五倍,而且还生成了更多连续的组装。我们通过自动化组装一个完全组装了六个染色体的人类基因组来展示 LJA 的实用性。

相似文献

1
Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads.多重 de Bruijn 图可从长的、高保真的读取中进行基因组组装。
Nat Biotechnol. 2022 Jul;40(7):1075-1081. doi: 10.1038/s41587-022-01220-6. Epub 2022 Feb 28.
2
SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs.SpLitter:利用 TELL-Seq 连接读取和组装图进行二倍体基因组组装。
PeerJ. 2024 Sep 27;12:e18050. doi: 10.7717/peerj.18050. eCollection 2024.
3
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具:见解与考虑。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.
4
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
5
Lossless indexing with counting de Bruijn graphs.基于计数型 de Bruijn 图的无损索引
Genome Res. 2022 Sep 27;32(9):1754-1764. doi: 10.1101/gr.276607.122.
6
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.
7
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
8
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
9
Integration of string and de Bruijn graphs for genome assembly.用于基因组组装的弦图与德布鲁因图整合
Bioinformatics. 2016 May 1;32(9):1301-7. doi: 10.1093/bioinformatics/btw011. Epub 2016 Jan 10.
10
Clover: a clustering-oriented de novo assembler for Illumina sequences.Clover:一款面向聚类的 Illumina 序列从头组装程序。
BMC Bioinformatics. 2020 Nov 17;21(1):528. doi: 10.1186/s12859-020-03788-9.

引用本文的文献

1
Telomere-to-telomere genome assembly uncovers Wolbachia-driven recurrent male bottleneck effect and selection in a sawfly.端粒到端粒的基因组组装揭示了叶蜂中沃尔巴克氏体驱动的反复出现的雄性瓶颈效应和选择。
Commun Biol. 2025 Aug 13;8(1):1211. doi: 10.1038/s42003-025-08629-0.
2
Evaluation of sequencing reads at scale using rdeval.使用rdeval大规模评估测序读数。
Bioinformatics. 2025 Jul 22. doi: 10.1093/bioinformatics/btaf416.
3
Genetic variation in recalcitrant repetitive regions of the genome.基因组难处理的重复区域中的遗传变异。

本文引用的文献

1
TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes.TwoPaCo:一种从多个完整基因组构建紧凑的 de Bruijn 图的高效算法。
Bioinformatics. 2017 Dec 15;33(24):4024-4032. doi: 10.1093/bioinformatics/btw609.
2
The fragment assembly string graph.片段组装字符串图。
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii79-85. doi: 10.1093/bioinformatics/bti1114.
Genome Res. 2025 Aug 5. doi: 10.1101/gr.280728.125.
4
A k-mer-based estimator of the substitution rate between repetitive sequences.一种基于k-mer的重复序列间替换率估计方法。
bioRxiv. 2025 Jun 25:2025.06.19.660607. doi: 10.1101/2025.06.19.660607.
5
K2R: Tinted de Bruijn graphs implementation for efficient read extraction from sequencing datasets.K2R:用于从测序数据集中高效提取 reads 的带颜色的德布鲁因图实现。
Bioinform Adv. 2025 May 14;5(1):vbaf111. doi: 10.1093/bioadv/vbaf111. eCollection 2025.
6
OReO: optimizing read order for practical compression.OReO:优化实际压缩的读取顺序
Bioinform Adv. 2025 Jun 3;5(1):vbaf128. doi: 10.1093/bioadv/vbaf128. eCollection 2025.
7
CloseRead: a tool for assessing assembly errors in immunoglobulin loci applied to vertebrate long-read genome assemblies.CloseRead:一种用于评估免疫球蛋白基因座装配错误的工具,应用于脊椎动物长读长基因组装配。
Genome Biol. 2025 May 20;26(1):131. doi: 10.1186/s13059-025-03594-7.
8
Applying the Safe-And-Complete Framework to Practical Genome Assembly.将安全且完整框架应用于实际基因组组装。
Lebniz Int Proc Inform. 2024;312. doi: 10.4230/LIPIcs.WABI.2024.8. Epub 2024 Aug 26.
9
RAmbler resolves complex repeats in human Chromosomes 8, 19, and X.RAmbler解析人类8号、19号和X染色体中的复杂重复序列。
Genome Res. 2025 Apr 14;35(4):863-876. doi: 10.1101/gr.279308.124.
10
Evaluation of sequencing reads at scale using rdeval.使用rdeval对大规模测序读数进行评估。
bioRxiv. 2025 Feb 8:2025.02.01.636073. doi: 10.1101/2025.02.01.636073.