Yuan Xiguo, Miller David J, Zhang Junying, Herrington David, Wang Yue
School of Computer Science and Technology, Xidian University, Xi'an, P.R. China.
J Comput Biol. 2012 Jan;19(1):42-54. doi: 10.1089/cmb.2010.0188. Epub 2011 Dec 9.
Simulation studies in population genetics play an important role in helping to better understand the impact of various evolutionary and demographic scenarios on sequence variation and sequence patterns, and they also permit investigators to better assess and design analytical methods in the study of disease-associated genetic factors. To facilitate these studies, it is imperative to develop simulators with the capability to accurately generate complex genomic data under various genetic models. Currently, a number of efficient simulation software packages for large-scale genomic data are available, and new simulation programs with more sophisticated capabilities and features continue to emerge. In this article, we review the three basic simulation frameworks--coalescent, forward, and resampling--and some of the existing simulators that fall under these frameworks, comparing them with respect to their evolutionary and demographic scenarios, their computational complexity, and their specific applications. Additionally, we address some limitations in current simulation algorithms and discuss future challenges in the development of more powerful simulation tools.
群体遗传学中的模拟研究在帮助更好地理解各种进化和人口统计学情景对序列变异和序列模式的影响方面发挥着重要作用,并且它们还使研究人员能够在疾病相关遗传因素的研究中更好地评估和设计分析方法。为了促进这些研究,开发能够在各种遗传模型下准确生成复杂基因组数据的模拟器势在必行。目前,有许多用于大规模基因组数据的高效模拟软件包,并且具有更复杂功能和特性的新模拟程序不断涌现。在本文中,我们回顾了三种基本的模拟框架——溯祖、正向和重采样——以及属于这些框架的一些现有模拟器,并在进化和人口统计学情景、计算复杂性以及特定应用方面对它们进行比较。此外,我们指出了当前模拟算法中的一些局限性,并讨论了开发更强大模拟工具未来面临的挑战。