韩国人类基因组的从头组装和相位。
De novo assembly and phasing of a Korean human genome.
机构信息
Genomic Medicine Institute (GMI), Medical Research Center, Seoul National University, Seoul 110-799, South Korea.
Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul 110-799, South Korea.
出版信息
Nature. 2016 Oct 13;538(7624):243-247. doi: 10.1038/nature20098. Epub 2016 Oct 5.
Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.
基因组组装和相位分析的进展为研究人类基因组的二倍体结构提供了机会,并揭示了不同人群群体之间结构变异的全貌。在这里,我们报告了使用单分子实时测序、下一代图谱、基于微流控的连锁读取和细菌人工染色体 (BAC) 测序方法对韩国个体 AK1(参考文献 1)进行从头组装和单倍型相位分析。单分子测序与下一代图谱相结合产生了高度连续的组装结果,其 contig N50 大小为 17.9 Mb,支架 N50 大小为 44.8 Mb,将 8 条染色体臂解析为单个支架。从头组装,以及局部组装和跨越长读长,将参考基因组中 190 个 euchromatic 缺口中的 105 个缺口闭合,并将 72 个缺口延伸,增加了 1.03 Mb 以前难以处理的序列。组装与来自 62,758 个 BAC 克隆的配对末端序列之间的高度一致性为组装的稳健性提供了强有力的支持。我们通过将组装与人类参考基因组直接比较来鉴定 18,210 个结构变体,鉴定了数千个断点,据我们所知,这些断点以前没有报道过。许多插入反映在转录组中,并在亚洲人群中共享。我们使用来自全基因组测序的短读长、长读长和连锁读长以及来自 31,719 个 BAC 克隆的短读长对组装进行了单倍型相位分析,从而实现了 N50 大小为 11.6 Mb 的相分块。从单分子实时读取组装的单倍型块分配到相分块上的单倍型覆盖了 89%的基因。单倍型准确地描述了高度变异的主要组织相容性复合物区域,并展示了 CYP2D6 等临床相关基因中的等位基因构型。这项工作展示了迄今为止最连续的人类二倍体基因组组装,广泛研究了未报告的和亚洲特有的结构变体,并对临床相关等位基因进行了高质量的单倍型分析,为精准医学提供了支持。