JTK：靶向二倍体基因组组装器。

JTK: targeted diploid genome assembler.

机构信息

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan.

出版信息

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad398.

DOI:10.1093/bioinformatics/btad398

PMID:37354526

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10320103/

Abstract

MOTIVATION

Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large segmental duplications (SDs) or structural variations (SVs) because the alignments of reads deriving from these regions tend to be unreliable. Another approach is to use highly accurate PacBio HiFi reads to output diploid assembly directly. Nonetheless, HiFi reads cannot phase homozygous regions longer than their length and require oxford nanopore technology (ONT) reads or Hi-C to produce a fully phased assembly. Is a single long-read sequencing technology sufficient to create an accurate diploid assembly?

RESULTS

Here, we present JTK, a megabase-scale diploid genome assembler. It first randomly samples kilobase-scale sequences (called 'chunks') from the long reads, phases variants found on them, and produces two haplotypes. The novel idea of JTK is to utilize chunks to capture SNVs and SVs simultaneously. From 60-fold ONT reads on the HG002 and a Japanese sample, it fully assembled two haplotypes with approximately 99.9% accuracy on the histocompatibility complex (MHC) and the leukocyte receptor complex (LRC) regions, which was impossible by the reference-based approach. In addition, in the LRC region on a Japanese sample, JTK output an assembly of better contiguity than those built from high-coverage HiFi+Hi-C. In the coming age of pan-genomics, JTK would complement the reference-based phasing method to assemble the difficult-to-assemble but medically important regions.

AVAILABILITY AND IMPLEMENTATION

JTK is available at https://github.com/ban-m/jtk, and the datasets are available at https://doi.org/10.5281/zenodo.7790310 or JGAS000580 in DDBJ.

摘要

动机

二倍体组装，或分别确定同源染色体的序列，对于阐明单倍型之间的遗传差异至关重要。一种方法是在参考序列上调用并相位单核苷酸变体 (SNV)。然而，这种方法在大的片段重复 (SD) 或结构变异 (SV) 上变得不稳定，因为来自这些区域的读取的比对往往不可靠。另一种方法是使用高度准确的 PacBio HiFi 读取直接输出二倍体组装。然而，HiFi 读取不能相位纯合区域长于其长度，并且需要牛津纳米孔技术 (ONT) 读取或 Hi-C 来产生完全相位组装。单一的长读测序技术是否足以创建准确的二倍体组装？

结果

在这里，我们提出了 JTK，这是一种兆碱基规模的二倍体基因组组装器。它首先从长读中随机采样千碱基规模的序列（称为“块”），在其上相位变体，并产生两个单倍型。JTK 的新颖思想是利用块同时捕获 SNV 和 SV。从 60 倍 ONT 读取的 HG002 和一个日本样本中，它在主要组织相容性复合体 (MHC) 和白细胞受体复合物 (LRC) 区域完全组装了两个单倍型，准确度约为 99.9%，这是基于参考的方法不可能实现的。此外，在日本样本的 LRC 区域，JTK 输出的组装比高覆盖度 HiFi+Hi-C 构建的组装更具连续性。在泛基因组学的时代，JTK 将补充基于参考的相位方法，以组装难以组装但具有医学重要性的区域。

可用性和实现

JTK 可在 https://github.com/ban-m/jtk 获得，数据集可在 https://doi.org/10.5281/zenodo.7790310 或 DDBJ 中的 JGAS000580 获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

JTK：靶向二倍体基因组组装器。

JTK: targeted diploid genome assembler.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

JTK：靶向二倍体基因组组装器。

JTK: targeted diploid genome assembler.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献