Ferrari Tessa, Feng Siyuan, Zhang Xinjun, Mooney Jazlyn
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
Laboratory of Genetics, University of Wisconsin, Madison, WI, USA.
Genome Biol Evol. 2025 May 30;17(6). doi: 10.1093/gbe/evaf097.
Scaling is a common practice in population genetic simulations to increase computational efficiency. However, few studies systematically examine the effects of scaling on diversity estimates and the comparability of scaled results to unscaled simulations and empirical data. We investigate the effects of scaling in two species, modern humans and Drosophila melanogaster. These species have stark differences in population size and generation time, necessitating moderate-to-no scaling for humans and dramatic scaling for Drosophila. We determine how coalescence, runtime, memory, estimates of diversity, the site frequency spectra, and linkage disequilibrium are influenced by scaling. We also examine the impact of simulated segment length and burn-in time on these metrics. Our results demonstrate that while computational efficiency improves with scaling, large scaling factors distort genetic diversity and dynamics between genetic variants, resulting in deviations from the intended model and empirical observations. Specifically, strongly scaled simulations may experience stronger negative selection on deleterious mutations, which amplifies background selection and purges linked mutations, leaving only rare strongly deleterious variants in the final population. We additionally show that a heuristic burn-in length of 10N generations is often insufficient for full coalescence in both models and alters expected linkage disequilibrium patterns. Finally, we provide considerations for conducting scaled simulations and offer potential strategies for the mitigation of scaling effects. For most nonmodel species simulations, we advocate for a bespoke scaling strategy drawn from these use cases.
缩放是群体遗传模拟中提高计算效率的常见做法。然而,很少有研究系统地研究缩放对多样性估计的影响,以及缩放结果与未缩放模拟和实证数据的可比性。我们研究了缩放对现代人类和黑腹果蝇这两个物种的影响。这些物种在种群大小和世代时间上有显著差异,因此人类需要适度缩放或不缩放,而果蝇则需要大幅缩放。我们确定了合并、运行时间、内存、多样性估计、位点频率谱和连锁不平衡如何受到缩放的影响。我们还研究了模拟片段长度和预烧时间对这些指标的影响。我们的结果表明,虽然计算效率随着缩放而提高,但大的缩放因子会扭曲遗传多样性和遗传变异之间的动态,导致与预期模型和实证观察结果产生偏差。具体而言,强缩放模拟可能会对有害突变产生更强的负选择,这会放大背景选择并清除连锁突变,最终种群中只留下罕见的强有害变异。我们还表明,10N代的启发式预烧长度通常不足以在两个模型中实现完全合并,并会改变预期的连锁不平衡模式。最后,我们提供了进行缩放模拟的注意事项,并提出了减轻缩放效应的潜在策略。对于大多数非模式物种模拟,我们提倡从这些用例中得出定制的缩放策略。