Huddleston John, Ranade Swati, Malig Maika, Antonacci Francesca, Chaisson Mark, Hon Lawrence, Sudmant Peter H, Graves Tina A, Alkan Can, Dennis Megan Y, Wilson Richard K, Turner Stephen W, Korlach Jonas, Eichler Evan E
Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;
Genome Res. 2014 Apr;24(4):688-96. doi: 10.1101/gr.168450.113. Epub 2014 Jan 13.
Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.
获得近期片段重复复杂区域的高质量序列连续性仍然是完成基因组组装的主要挑战之一。在人类和小鼠基因组中,这是通过使用基于毛细管的昂贵且费力的测序方法靶向大插入片段克隆来实现的。然而,克隆插入片段的桑格鸟枪法测序现在已基本被放弃,在主要由新一代测序杂交方法产生的较新基因组组装中,这些区域大多未得到解决。在这里,我们表明,使用太平洋生物科学公司(PacBio)的长读长单分子实时(SMRT)测序和组装技术,有可能在全基因组范围内解决复杂但单独来看简单的区域,且所需时间和成本仅为传统方法的一小部分。我们对与17号染色体q21.31区域1.3兆碱基对的复杂区域相对应的BAC克隆进行了测序和组装,结果表明与相同克隆的桑格组装具有99.994%的一致性。我们使用Illumina测序针对44个差异进行分析,发现PacBio组装和桑格组装具有相当数量的经过验证的变异,尽管存在不同的序列背景偏差。最后,我们针对黑猩猩基因组中一个组装不佳的766千碱基对重复区域,以传统完成方法所需成本和时间的一小部分解决了其结构和组织问题。我们的数据为将基因组提升到更高质量的完成状态提供了一条直接的途径。