• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

JTK:靶向二倍体基因组组装器。

JTK: targeted diploid genome assembler.

机构信息

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan.

出版信息

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad398.

DOI:10.1093/bioinformatics/btad398
PMID:37354526
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10320103/
Abstract

MOTIVATION

Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large segmental duplications (SDs) or structural variations (SVs) because the alignments of reads deriving from these regions tend to be unreliable. Another approach is to use highly accurate PacBio HiFi reads to output diploid assembly directly. Nonetheless, HiFi reads cannot phase homozygous regions longer than their length and require oxford nanopore technology (ONT) reads or Hi-C to produce a fully phased assembly. Is a single long-read sequencing technology sufficient to create an accurate diploid assembly?

RESULTS

Here, we present JTK, a megabase-scale diploid genome assembler. It first randomly samples kilobase-scale sequences (called 'chunks') from the long reads, phases variants found on them, and produces two haplotypes. The novel idea of JTK is to utilize chunks to capture SNVs and SVs simultaneously. From 60-fold ONT reads on the HG002 and a Japanese sample, it fully assembled two haplotypes with approximately 99.9% accuracy on the histocompatibility complex (MHC) and the leukocyte receptor complex (LRC) regions, which was impossible by the reference-based approach. In addition, in the LRC region on a Japanese sample, JTK output an assembly of better contiguity than those built from high-coverage HiFi+Hi-C. In the coming age of pan-genomics, JTK would complement the reference-based phasing method to assemble the difficult-to-assemble but medically important regions.

AVAILABILITY AND IMPLEMENTATION

JTK is available at https://github.com/ban-m/jtk, and the datasets are available at https://doi.org/10.5281/zenodo.7790310 or JGAS000580 in DDBJ.

摘要

动机

二倍体组装,或分别确定同源染色体的序列,对于阐明单倍型之间的遗传差异至关重要。一种方法是在参考序列上调用并相位单核苷酸变体 (SNV)。然而,这种方法在大的片段重复 (SD) 或结构变异 (SV) 上变得不稳定,因为来自这些区域的读取的比对往往不可靠。另一种方法是使用高度准确的 PacBio HiFi 读取直接输出二倍体组装。然而,HiFi 读取不能相位纯合区域长于其长度,并且需要牛津纳米孔技术 (ONT) 读取或 Hi-C 来产生完全相位组装。单一的长读测序技术是否足以创建准确的二倍体组装?

结果

在这里,我们提出了 JTK,这是一种兆碱基规模的二倍体基因组组装器。它首先从长读中随机采样千碱基规模的序列(称为“块”),在其上相位变体,并产生两个单倍型。JTK 的新颖思想是利用块同时捕获 SNV 和 SV。从 60 倍 ONT 读取的 HG002 和一个日本样本中,它在主要组织相容性复合体 (MHC) 和白细胞受体复合物 (LRC) 区域完全组装了两个单倍型,准确度约为 99.9%,这是基于参考的方法不可能实现的。此外,在日本样本的 LRC 区域,JTK 输出的组装比高覆盖度 HiFi+Hi-C 构建的组装更具连续性。在泛基因组学的时代,JTK 将补充基于参考的相位方法,以组装难以组装但具有医学重要性的区域。

可用性和实现

JTK 可在 https://github.com/ban-m/jtk 获得,数据集可在 https://doi.org/10.5281/zenodo.7790310 或 DDBJ 中的 JGAS000580 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/f47aa38d6de7/btad398f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/1ac1a222a47e/btad398f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/3ed392c462b7/btad398f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/f47aa38d6de7/btad398f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/1ac1a222a47e/btad398f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/3ed392c462b7/btad398f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f92b/10320103/f47aa38d6de7/btad398f3.jpg

相似文献

1
JTK: targeted diploid genome assembler.JTK:靶向二倍体基因组组装器。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad398.
2
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.高质量的长读长序列对于实现生物多样性基因组学的潜力至关重要。
BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.
3
SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs.SpLitter:利用 TELL-Seq 连接读取和组装图进行二倍体基因组组装。
PeerJ. 2024 Sep 27;12:e18050. doi: 10.7717/peerj.18050. eCollection 2024.
4
Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.基于连锁reads 的单体型辅助二倍体组装和变异检测。
Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11.
5
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.HiCanu:从高保真长读段中精确组装片段重复、卫星和等位基因变体。
Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.
6
Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.
7
Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly.通过优化基于单分子实时测序的单倍体组装构建端粒到端粒的二倍体基因组。
Nat Methods. 2024 Apr;21(4):574-583. doi: 10.1038/s41592-023-02141-1. Epub 2024 Mar 8.
8
phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.相位图:基于长读长的二倍体基因组单体型感知从头组装
Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x.
9
Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data.双核体中单倍型的物理分离允许使用 Hi-C 数据对 Nanopore 和 HiFi 组装的相位准确性进行基准测试。
Genome Biol. 2022 Mar 25;23(1):84. doi: 10.1186/s13059-022-02658-2.
10
BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies.BLR:一种用于多种链接读取技术的单倍型分析的灵活管道。
Nucleic Acids Res. 2023 Dec 11;51(22):e114. doi: 10.1093/nar/gkad1010.

引用本文的文献

1
Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing.Graphasing:利用单细胞测序进行二倍体基因组组装图谱的相位分析。
Genome Biol. 2024 Oct 10;25(1):265. doi: 10.1186/s13059-024-03409-1.
2
Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing.利用单细胞链测序对二倍体基因组组装图进行定相
bioRxiv. 2024 Jun 20:2024.02.15.580432. doi: 10.1101/2024.02.15.580432.

本文引用的文献

1
Gaps and complex structurally variant loci in phased genome assemblies.分相基因组组装中的缺口和复杂结构变异位点。
Genome Res. 2023 Apr;33(4):496-510. doi: 10.1101/gr.277334.122. Epub 2023 May 10.
2
Complete sequences of six major histocompatibility complex haplotypes, including all the major MHC class II structures.六个主要组织相容性复合体单倍型的完整序列,包括所有主要的 MHC Ⅱ类结构。
HLA. 2023 Jul;102(1):28-43. doi: 10.1111/tan.15020. Epub 2023 Mar 18.
3
Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.
4
Semi-automated assembly of high-quality diploid human reference genomes.半自动组装高质量的二倍体人类参考基因组。
Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19.
5
DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing.DeepRepeat:从纳米孔测序信号数据中直接定量短串联重复序列。
Genome Biol. 2022 Apr 28;23(1):108. doi: 10.1186/s13059-022-02670-6.
6
Segmental duplications and their variation in a complete human genome.人类全基因组中的串联重复序列及其变异。
Science. 2022 Apr;376(6588):eabj6965. doi: 10.1126/science.abj6965. Epub 2022 Apr 1.
7
Haplotype-resolved assembly of diploid genomes without parental data.单体型解析组装二倍体基因组,无需父母本数据。
Nat Biotechnol. 2022 Sep;40(9):1332-1335. doi: 10.1038/s41587-022-01261-x. Epub 2022 Mar 24.
8
LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants.LongPhase:一种用于小变异和大变异的超快速染色体规模定相算法。
Bioinformatics. 2022 Mar 28;38(7):1816-1822. doi: 10.1093/bioinformatics/btac058.
9
phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.相位图:基于长读长的二倍体基因组单体型感知从头组装
Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x.
10
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.使用带有 hifiasm 的相定装配图进行单体型解析从头组装。
Nat Methods. 2021 Feb;18(2):170-175. doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.