Pacific Biosciences, 1305 O'Brien Drive, Menlo Park, CA 94025, USA.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.
Genes (Basel). 2019 Jan 18;10(1):62. doi: 10.3390/genes10010062.
A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality genome assembly from a single mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.
高质量的参考基因组是功能遗传学、比较基因组学和群体基因组学的基础资源,对于保护生物学也越来越重要。PacBio 单分子实时(SMRT)测序产生具有均匀覆盖度和高一致性准确性的长读长,是基因组组装的强大技术。通量的提高和成本的相应降低使 PacBio 成为许多大型基因组计划的有吸引力的核心技术,然而,相对较高的 DNA 输入要求(标准文库方案为~5 µg)使得 PacBio 无法用于许多 DNA 含量较低的小型生物项目,或由于其他原因 DNA 输入有限的项目。在这里,我们展示了一个来自单个蚊子的高质量基因组组装。使用一种未经 DNA 剪切和大小选择的改良 SMRTbell 文库构建方案,从仅 100 ng 的起始基因组 DNA 生成 SMRTbell 文库。该样本在 Sequel 系统上使用化学物质 3.0 和软件 v6.0 运行,每个 SMRT 细胞平均生成 25 Gb 的序列,20 h 的电影,然后使用 FALCON-Unzip 进行二倍体基因组组装。生成的经校正的组装具有较高的连续性(contig N50 为 3.5 Mb)和完整性(存在超过 98%的保守基因和全长)。此外,这个单只昆虫的组装现在将以前未定位的 667 个(>90%)基因中的 667 个(>90%)基因定位到 AgamP4 PEST 参考基因组的适当染色体环境中。我们还能够解析超过 1/3 基因组的母本和父本单倍型。通过对单个二倍体个体的测序和组装,仅存在两种单倍型,与来自多个混合个体的样本相比,简化了组装过程。本文提出的方法可应用于起始 DNA 量低至每 1 Gb 基因组大小 100 ng 的样本。这种新的低输入方法使 PacBio 为基础的组装可以用于包含大部分生命多样性的高度杂合的小型生物体。