• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过保留包含的读数进行端粒到端粒组装。

Telomere-to-telomere assembly by preserving contained reads.

机构信息

Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India.

Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India

出版信息

Genome Res. 2024 Nov 20;34(11):1908-1918. doi: 10.1101/gr.279311.124.

DOI:10.1101/gr.279311.124
PMID:39406502
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11610600/
Abstract

Automated telomere-to-telomere (T2T) de novo assembly of diploid and polyploid genomes remains a formidable task. A string graph is a commonly used assembly graph representation in the assembly algorithms. The string graph formulation employs graph simplification heuristics, which drastically reduce the count of vertices and edges. One of these heuristics involves removing the reads contained in longer reads. In practice, this heuristic occasionally introduces gaps in the assembly by removing all reads that cover one or more genome intervals. The factors contributing to such gaps remain poorly understood. In this work, we mathematically derived the frequency of observing a gap near a germline and a somatic heterozygous variant locus. Our analysis shows that (1) an assembly gap due to contained read deletion is an order of magnitude more frequent in Oxford Nanopore Technologies (ONT) reads than Pacific Biosciences high-fidelity (PacBio HiFi) reads due to differences in their read-length distributions, and (2) this frequency decreases with an increase in the sequencing depth. Drawing cues from these observations, we addressed the weakness of the string graph formulation by developing the repeat-aware fragmenting tool (RAFT) assembly algorithm. RAFT addresses the issue of contained reads by fragmenting reads and producing a more uniform read-length distribution. The algorithm retains spanned repeats in the reads during the fragmentation. We empirically demonstrate that RAFT significantly reduces the number of gaps using simulated data sets. Using real ONT and PacBio HiFi data sets of the HG002 human genome, we achieved a twofold increase in the contig NG50 and the number of haplotype-resolved T2T contigs compared to hifiasm.

摘要

自动化端到端(T2T)从头组装二倍体和多倍体基因组仍然是一项艰巨的任务。串图是组装算法中常用的组装图表示形式。串图公式采用图简化启发式算法,极大地减少了顶点和边的数量。其中一种启发式算法涉及删除包含在较长读段中的读段。在实践中,这种启发式算法偶尔会通过删除覆盖一个或多个基因组区间的所有读段,在组装中引入间隙。导致这种间隙的因素仍未得到很好的理解。在这项工作中,我们从数学上推导出了在生殖系和体细胞杂合变异位点附近观察到间隙的频率。我们的分析表明:(1) 由于包含的读段删除导致的组装间隙在牛津纳米孔技术(ONT)读段中比太平洋生物科学高保真度(PacBio HiFi)读段更为常见,这是由于它们的读段长度分布不同,这种差异在数量级上;(2) 这种频率随着测序深度的增加而降低。根据这些观察结果,我们通过开发重复感知分段工具(RAFT)组装算法来解决串图公式的弱点。RAFT 通过分段读段并产生更均匀的读段长度分布来解决包含读段的问题。该算法在分段过程中保留读段中的跨越重复。我们通过模拟数据集经验证明,RAFT 显著减少了间隙数量。使用真实的 ONT 和 PacBio HiFi 人类 HG002 基因组数据集,与 hifiasm 相比,我们实现了 contig NG50 和单倍型解析 T2T contig 的数量增加了一倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/97a3e5ae0fef/1908f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/177716d07e85/1908f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/8b640508b264/1908f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/97a3e5ae0fef/1908f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/177716d07e85/1908f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/8b640508b264/1908f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c44/11610600/97a3e5ae0fef/1908f03.jpg

相似文献

1
Telomere-to-telomere assembly by preserving contained reads.通过保留包含的读数进行端粒到端粒组装。
Genome Res. 2024 Nov 20;34(11):1908-1918. doi: 10.1101/gr.279311.124.
2
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具:见解与考虑。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.
3
Coverage-preserving sparsification of overlap graphs for long-read assembly.重叠图的覆盖保持稀疏化用于长读长组装。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad124.
4
Pre-assembly NGS correction of ONT reads achieves HiFi-level assembly quality.对纳米孔测序(ONT)读数进行预组装的二代测序(NGS)校正可实现高保真度水平的组装质量。
Genome. 2025 Jan 1;68:1-9. doi: 10.1139/gen-2024-0132.
5
Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.
6
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing.仅使用纳米孔测序对完整的人类和植物染色体进行无缝组装。
bioRxiv. 2024 Mar 19:2024.03.15.585294. doi: 10.1101/2024.03.15.585294.
7
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing.仅用纳米孔测序技术实现完整人类和植物染色体的无缝组装。
Genome Res. 2024 Nov 20;34(11):1919-1930. doi: 10.1101/gr.279334.124.
8
RAmbler resolves complex repeats in human Chromosomes 8, 19, and X.RAmbler解析人类8号、19号和X染色体中的复杂重复序列。
Genome Res. 2025 Apr 14;35(4):863-876. doi: 10.1101/gr.279308.124.
9
Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore.比较两种最新的基因组组装测序技术:太平洋生物科学测序仪二代系统的 HiFi 读取和牛津纳米孔的超长读取。
Gigascience. 2020 Dec 15;9(12). doi: 10.1093/gigascience/giaa123.
10
Benchmarking multi-platform sequencing technologies for human genome assembly.多平台测序技术在人类基因组组装中的基准测试。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad300.

引用本文的文献

1
Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references.评估用于创建稳健的泛基因组参考的高质量单倍型解析基因组的数据要求。
Genome Biol. 2024 Dec 18;25(1):312. doi: 10.1186/s13059-024-03452-y.

本文引用的文献

1
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing.仅用纳米孔测序技术实现完整人类和植物染色体的无缝组装。
Genome Res. 2024 Nov 20;34(11):1919-1930. doi: 10.1101/gr.279334.124.
2
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.使用双图进行二倍体和多倍体基因组的可扩展端粒到端粒组装。
Nat Methods. 2024 Jun;21(6):967-970. doi: 10.1038/s41592-024-02269-8. Epub 2024 May 10.
3
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。
Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.
4
Time- and memory-efficient genome assembly with Raven.使用Raven进行高效省时的基因组组装。
Nat Comput Sci. 2021 May;1(5):332-336. doi: 10.1038/s43588-021-00073-4. Epub 2021 May 20.
5
The complete and fully-phased diploid genome of a male Han Chinese.一位男性汉族个体的完整、全面二倍体基因组。
Cell Res. 2023 Oct;33(10):745-761. doi: 10.1038/s41422-023-00849-5. Epub 2023 Jul 14.
6
Coverage-preserving sparsification of overlap graphs for long-read assembly.重叠图的覆盖保持稀疏化用于长读长组装。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad124.
7
Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.
8
Semi-automated assembly of high-quality diploid human reference genomes.半自动组装高质量的二倍体人类参考基因组。
Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19.
9
Metagenome assembly of high-fidelity long reads with hifiasm-meta.利用 hifiasm-meta 进行高保真长读长的宏基因组组装。
Nat Methods. 2022 Jun;19(6):671-674. doi: 10.1038/s41592-022-01478-3. Epub 2022 May 9.
10
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.