Marmesat Elena, Soriano Laura, Mazzoni Camila J, Sommer Simone, Godoy José A
Department of Integrative Ecology, Estación Biológica de Doñana (CSIC), Sevilla, Spain.
Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany.
PLoS One. 2016 Jun 13;11(6):e0157402. doi: 10.1371/journal.pone.0157402. eCollection 2016.
The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.
对于具有高拷贝数变异的多基因家族的表征,通常通过使用高度简并引物进行PCR扩增来实现,以涵盖目标区域侧翼的所有预期变体。这种方法常常会引入PCR偏差,导致高通量测序文库中目标的不平衡表征,最终导致目标等位基因的检测不完整。在这里,我们证实了这一结果,并提出了两种不同的扩增策略来缓解这一问题。第一种策略(称为混合PCR)在多个独立的PCR中使用不同的中度简并引物对靶向不同的等位基因子集,而第二种方法(称为混合引物)在单个PCR中使用定制的非简并引物池。我们以伊比利亚猞猁的MHC I类为模型,将它们的性能与使用高度简并引物的单一PCR的常用方法进行了比较。我们发现这两种新方法的效果相似且优于传统方法。它们在每个个体中显著检测到更多的等位基因(分别为11.33±1.38和11.72±0.89,而传统方法为7.94±1.95),产生了更完整的等位基因图谱(分别为96.28±8.46和99.50±2.12,而传统方法为63.76±15.43),并且在群体水平上揭示了更多的等位基因(分别为13个和12个)。最后,我们可以将每个等位基因的扩增效率与其侧翼序列中的引物错配联系起来,并表明高通量技术提供的超深度覆盖并不能完全补偿这种偏差,特别是因为真实等位基因的覆盖度可能低于假象。采用所提出的任何一种扩增方法都有机会在较低的覆盖度下获得更完整的等位基因图谱,提高对下游分析和后续应用的信心。