Department of Biology and the Ecology Center, Utah State University, 5305 Old Main Hill, Logan, UT, 84322-5305, USA.
Wildland Resources Department and the Ecology Center, Utah State University, Logan, UT, 84322, USA.
Mol Ecol Resour. 2017 Nov;17(6):1156-1167. doi: 10.1111/1755-0998.12657. Epub 2017 Mar 9.
Ploidy levels sometimes vary among individuals or populations, particularly in plants. When such variation exists, accurate determination of cytotype can inform studies of ecology or trait variation and is required for population genetic analyses. Here, we propose and evaluate a statistical approach for distinguishing low-level ploidy variants (e.g. diploids, triploids and tetraploids) based on genotyping-by-sequencing (GBS) data. The method infers cytotypes based on observed heterozygosity and the ratio of DNA sequences containing different alleles at thousands of heterozygous SNPs (i.e. allelic ratios). Whereas the method does not require prior information on ploidy, a reference set of samples with known ploidy can be included in the analysis if it is available. We explore the power and limitations of this method using simulated data sets and GBS data from natural populations of aspen (Populus tremuloides) known to include both diploid and triploid individuals. The proposed method was able to reliably discriminate among diploids, triploids and tetraploids in simulated data sets, and this was true for different levels of genetic diversity, inbreeding and population structure. Power and accuracy were minimally affected by low coverage (i.e. 2×), but did sometimes suffer when simulated mixtures of diploids, autotetraploids and allotetraploids were analysed. Cytotype assignments based on the proposed method closely matched those from previous microsatellite and flow cytometry data when applied to GBS data from aspen. An R package (gbs2ploidy) implementing the proposed method is available from CRAN.
倍性水平有时在个体或群体中存在差异,尤其是在植物中。当这种变化存在时,准确确定细胞型可以为生态学或性状变异的研究提供信息,并且是群体遗传分析所必需的。在这里,我们提出并评估了一种基于测序(GBS)数据的区分低水平倍性变异体(例如二倍体、三倍体和四倍体)的统计方法。该方法基于观察到的杂合度和包含数千个杂合 SNP 中不同等位基因的 DNA 序列(即等位基因比)的比值来推断细胞型。虽然该方法不需要关于倍性的先验信息,但如果有可用的话,可以在分析中包括具有已知倍性的参考样本集。我们使用模拟数据集和已知包含二倍体和三倍体个体的白杨(Populus tremuloides)自然种群的 GBS 数据来探索该方法的功效和局限性。所提出的方法能够可靠地区分模拟数据集中的二倍体、三倍体和四倍体,并且对于不同水平的遗传多样性、近交和群体结构都是如此。在低覆盖率(即 2×)下,功效和准确性受影响最小,但当分析模拟的二倍体、同源四倍体和异源四倍体混合物时,有时会受到影响。当应用于白杨的 GBS 数据时,基于所提出的方法的细胞型分配与先前微卫星和流式细胞术数据的分配非常匹配。一个实现所提出方法的 R 包(gbs2ploidy)可从 CRAN 获得。