Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St, Louis, MO 63108, USA.
BMC Genomics. 2012 Dec 6;13:683. doi: 10.1186/1471-2164-13-683.
Rare genetic variation in the human population is a major source of pathophysiological variability and has been implicated in a host of complex phenotypes and diseases. Finding disease-related genes harboring disparate functional rare variants requires sequencing of many individuals across many genomic regions and comparing against unaffected cohorts. However, despite persistent declines in sequencing costs, population-based rare variant detection across large genomic target regions remains cost prohibitive for most investigators. In addition, DNA samples are often precious and hybridization methods typically require large amounts of input DNA. Pooled sample DNA sequencing is a cost and time-efficient strategy for surveying populations of individuals for rare variants. We set out to 1) create a scalable, multiplexing method for custom capture with or without individual DNA indexing that was amenable to low amounts of input DNA and 2) expand the functionality of the SPLINTER algorithm for calling substitutions, insertions and deletions across either candidate genes or the entire exome by integrating the variant calling algorithm with the dynamic programming aligner, Novoalign.
We report methodology for pooled hybridization capture with pre-enrichment, indexed multiplexing of up to 48 individuals or non-indexed pooled sequencing of up to 92 individuals with as little as 70 ng of DNA per person. Modified solid phase reversible immobilization bead purification strategies enable no sample transfers from sonication in 96-well plates through adapter ligation, resulting in 50% less library preparation reagent consumption. Custom Y-shaped adapters containing novel 7 base pair index sequences with a Hamming distance of ≥2 were directly ligated onto fragmented source DNA eliminating the need for PCR to incorporate indexes, and was followed by a custom blocking strategy using a single oligonucleotide regardless of index sequence. These results were obtained aligning raw reads against the entire genome using Novoalign followed by variant calling of non-indexed pools using SPLINTER or SAMtools for indexed samples. With these pipelines, we find sensitivity and specificity of 99.4% and 99.7% for pooled exome sequencing. Sensitivity, and to a lesser degree specificity, proved to be a function of coverage. For rare variants (≤2% minor allele frequency), we achieved sensitivity and specificity of ≥94.9% and ≥99.99% for custom capture of 2.5 Mb in multiplexed libraries of 22-48 individuals with only ≥5-fold coverage/chromosome, but these parameters improved to ≥98.7 and 100% with 20-fold coverage/chromosome.
This highly scalable methodology enables accurate rare variant detection, with or without individual DNA sample indexing, while reducing the amount of required source DNA and total costs through less hybridization reagent consumption, multi-sample sonication in a standard PCR plate, multiplexed pre-enrichment pooling with a single hybridization and lesser sequencing coverage required to obtain high sensitivity.
人类群体中罕见的遗传变异是病理生理学变异性的主要来源,并与许多复杂表型和疾病有关。发现携带不同功能罕见变异的疾病相关基因需要对许多个体在许多基因组区域进行测序,并与未受影响的队列进行比较。然而,尽管测序成本持续下降,但对大多数研究人员来说,在大型基因组目标区域进行基于人群的罕见变异检测仍然成本过高。此外,DNA 样本通常很珍贵,杂交方法通常需要大量的输入 DNA。 pooled sample DNA sequencing 是一种用于调查个体罕见变异的经济高效的策略。我们着手 1)创建一种可扩展的、带有或不带有个体 DNA 索引的定制捕获的多路复用方法,该方法适用于少量输入 DNA;2)通过将变体调用算法与动态规划对齐器 Novoalign 集成,扩展 SPLINTER 算法用于在候选基因或整个外显子中调用替换、插入和缺失的功能。
我们报告了一种带有预富集的 pooled hybridization capture 方法学,最多可对 48 个个体或多达 92 个个体进行索引多路复用,每个个体的 DNA 量低至 70ng。改良的固相可逆固定珠纯化策略可避免从 96 孔板的超声处理到接头连接的样品转移,从而使文库制备试剂的消耗减少 50%。包含新型 7 碱基对索引序列的 Y 形定制接头与 Hamming 距离≥2 的索引序列直接连接到片段化的源 DNA 上,从而无需 PCR 即可掺入索引,并且随后使用单个寡核苷酸(无论索引序列如何)进行定制阻塞策略。这些结果是通过使用 Novoalign 对原始读数进行全基因组比对,并使用 SPLINTER 或 SAMtools 对非索引池进行变体调用而获得的。使用这些管道,我们发现 pooled exome sequencing 的灵敏度和特异性分别为 99.4%和 99.7%。灵敏度,在较小程度上是特异性,是覆盖范围的函数。对于罕见变异(≤2%的次要等位基因频率),我们在 22-48 个个体的多路复用文库中对 2.5Mb 进行定制捕获时,灵敏度和特异性分别≥94.9%和≥99.99%,并且仅需要≥5 倍覆盖/染色体,但这些参数在 20 倍覆盖/染色体时提高到≥98.7%和 100%。
该高度可扩展的方法学可实现罕见变异的准确检测,无论是带有个体 DNA 样本索引还是不带有个体 DNA 样本索引,同时通过减少杂交试剂的消耗、在标准 PCR 板中进行多样本超声处理、使用单个杂交进行预富集多路复用以及获得高灵敏度所需的较少测序覆盖范围,来减少所需的源 DNA 量和总成本。