Dabi Amjad, Schrider Daniel R
Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Genetics. 2025 Jan 8;229(1):1-57. doi: 10.1093/genetics/iyae180.
Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Q, and compared the deviation of key outcomes (fixation times, allele frequencies, linkage disequilibrium, and the fraction of mutations that fix during the simulation) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Q. Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward; thus, it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Q. In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling procedure's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Q.
模拟是群体遗传学研究所有领域的重要工具,用于理论分析验证和复杂进化模型研究等任务。正向时间模拟特别灵活,允许各种类型的自然选择、复杂的遗传结构和非赖特-费希尔动态。然而,其巨大的计算需求可能会阻碍对大群体和基因组的模拟。一种减轻这种负担的常用方法是将群体大小按某个缩放因子缩小,同时将突变率、选择系数和重组率按相同因子放大。然而,这种重新缩放方法在某些情况下可能会使模拟结果产生偏差。为了研究重新缩放影响模拟结果的方式和程度,我们使用几个重新缩放因子Q的值,对具有不同人口统计历史和适合度效应分布的情况进行了模拟,并比较了缩放模拟和未缩放模拟之间关键结果(固定时间、等位基因频率、连锁不平衡以及模拟过程中固定的突变比例)的偏差。我们的结果表明,即使在Q值较小的情况下,缩放也会给这些测量结果中的每一个带来显著偏差。此外,这些效应的性质取决于所研究的进化模型和缩放因子。虽然增加缩放因子往往会增加观察到的偏差,但这种关系并不总是直接的;因此,可能很难事先知道缩放对模拟结果的影响。然而,似乎对于大多数模型,只需要少量的重复就能准确量化给定Q值下重新缩放产生的偏差。总之,虽然在许多情况下正向时间模拟的重新缩放可能是必要的,但研究人员应该意识到重新缩放程序对模拟结果的影响,并在选择合适的Q值之前,考虑在所需模型的较小规模模拟中研究其大小。