Laboratoire D'Ecologie Alpine (LECA), Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, 38000, Grenoble, France.
Department of Ecology and Evolution, UNIL-Sorge, University of Lausanne, 1015, Biophore, Lausanne, Switzerland.
Mol Genet Genomics. 2021 Mar;296(2):457-471. doi: 10.1007/s00438-020-01756-9. Epub 2021 Jan 20.
Next-generation sequencing technologies have opened a new era of research in population genetics. Following these new sequencing opportunities, the use of restriction enzyme-based genotyping techniques, such as restriction site-associated DNA sequencing (RAD-seq) or double-digest RAD-sequencing (ddRAD-seq), has dramatically increased in the last decade. From DNA sampling to SNP calling, the laboratory and bioinformatic parameters of enzyme-based techniques have been investigated in the literature. However, the impact of those parameters on downstream analyses and biological results remains less documented. In this study, we investigated the effects of sevral pre- and post-sequencing settings on ddRAD-seq results for two biological systems: a complex of butterfly species (Coenonympha sp.) and several populations of common beech (Fagus sylvatica). Our results suggest that pre-sequencing parameters (i.e., DNA quantity, number of PCR cycles during library preparation) have a significant impact on the number of recovered reads and SNPs, on the number of unique alleles and on individual heterozygosity. In the same way, we found that post-sequencing settings (i.e., clustering and minimum coverage thresholds) influenced loci reconstruction (e.g., number of loci, mean coverage) and SNP calling (e.g., number of SNPs; heterozygosity) but had only a marginal impact on downstream analyses (e.g., measure of genetic differentiation, estimation of individual admixture, and demographic inferences). In addition, replication analyses confirmed the reproducibility of the ddRAD-seq procedure. Overall, this study assesses the degree of sensitivity of ddRAD-seq data to pre- and post-sequencing protocols, and illustrates its robustness when studying population genetics.
下一代测序技术为群体遗传学研究开辟了新纪元。在这些新测序机会的推动下,基于限制酶的基因分型技术(如限制酶关联 DNA 测序(RAD-seq)或双酶切 RAD 测序(ddRAD-seq))在过去十年中得到了极大的应用。从 DNA 采样到 SNP 调用,文献中已经研究了基于酶的技术的实验室和生物信息学参数。然而,这些参数对下游分析和生物学结果的影响记录较少。在这项研究中,我们调查了几个预测序和后测序设置对两个生物系统的 ddRAD-seq 结果的影响:蝴蝶物种复合体(Coenonympha sp.)和几种欧洲山毛榉(Fagus sylvatica)种群。我们的结果表明,预测序参数(即 DNA 量、文库制备过程中的 PCR 循环数)对回收的reads 和 SNPs 的数量、独特等位基因的数量和个体杂合度有显著影响。同样,我们发现后测序设置(即聚类和最小覆盖阈值)影响了基因座重建(例如,基因座数量、平均覆盖度)和 SNP 调用(例如,SNP 数量;杂合度),但对下游分析的影响很小(例如,遗传分化的度量、个体混合度的估计和人口统计学推断)。此外,复制分析证实了 ddRAD-seq 程序的可重复性。总体而言,这项研究评估了 ddRAD-seq 数据对预测序和后测序协议的敏感性程度,并说明了其在研究群体遗传学方面的稳健性。