Aguirre-Liguori Jonás A, Luna-Sánchez Javier A, Gasca-Pineda Jaime, Eguiarte Luis E
Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City, Mexico.
Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, United States.
Front Genet. 2020 Sep 18;11:870. doi: 10.3389/fgene.2020.00870. eCollection 2020.
Massive parallel sequencing (MPS) is revolutionizing the field of molecular ecology by allowing us to understand better the evolutionary history of populations and species, and to detect genomic regions that could be under selection. However, the economic and computational resources needed generate a tradeoff between the amount of loci that can be obtained and the number of populations or individuals that can be sequenced. In this work, we analyzed and compared two simulated genomic datasets fitting a hierarchical structure, two extensive empirical genomic datasets, and a dataset comprising microsatellite information. For all datasets, we generated different subsampling designs by changing the number of loci, individuals, populations, and individuals per population to test for deviations in classic population genetics parameters ( , , ). For the empirical datasets we also analyzed the effect of sampling design on landscape genetic tests (isolation by distance and environment, central abundance hypothesis). We also tested the effect of sampling a different number of populations in the detection of outlier SNPs. We found that the microsatellite dataset is very sensitive to the number of individuals sampled when obtaining summary statistics. was particularly sensitive to a low sampling of individuals in the simulated, genomic, and microsatellite datasets. For the empirical and simulated genomic datasets, we found that as long as many populations are sampled, few individuals and loci are needed. For the empirical datasets, we found that increasing the number of populations sampled was important in obtaining precise landscape genetic estimates. Finally, we corroborated that outlier tests are sensitive to the number of populations sampled. We conclude by proposing different sampling designs depending on the objectives.
大规模平行测序(MPS)正在彻底改变分子生态学领域,它使我们能够更好地理解种群和物种的进化历史,并检测可能处于选择之下的基因组区域。然而,所需的经济和计算资源在可获得的位点数量与可测序的种群或个体数量之间产生了权衡。在这项工作中,我们分析并比较了两个符合层次结构的模拟基因组数据集、两个广泛的经验基因组数据集以及一个包含微卫星信息的数据集。对于所有数据集,我们通过改变位点数量、个体数量、种群数量以及每个种群中的个体数量来生成不同的子采样设计,以测试经典群体遗传学参数( 、 、 )的偏差。对于经验数据集,我们还分析了采样设计对景观遗传学测试(距离隔离和环境隔离、中心丰度假说)的影响。我们还测试了在检测异常单核苷酸多态性(SNP)时采样不同数量种群的效果。我们发现,在获取汇总统计信息时,微卫星数据集对采样的个体数量非常敏感。在模拟基因组数据集和微卫星数据集中, 对个体的低采样特别敏感。对于经验基因组数据集和模拟基因组数据集,我们发现只要采样了许多种群,所需的个体和位点就很少。对于经验数据集,我们发现增加采样的种群数量对于获得精确的景观遗传学估计很重要。最后,我们证实异常值测试对采样的种群数量敏感。我们根据目标提出了不同的采样设计,以此作为结论。