Carvajal-Rodríguez Antonio
Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310 Vigo, Spain.
BMC Bioinformatics. 2008 Apr 30;9:223. doi: 10.1186/1471-2105-9-223.
There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation.
I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios.
GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTRxMG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page [1].
在种群生物学研究中,有几种情况需要模拟DNA序列。可以使用反向或正向策略对不同进化遗传模型下的生物种群进行模拟。反向模拟,也称为基于溯祖的模拟,计算效率高。原因是它们基于当前种群中有存活后代的谱系历史。相反,正向模拟效率较低,因为整个种群是从过去到现在进行模拟的。然而,溯祖框架存在一些正向模拟所没有的局限性。因此,人们对正向种群遗传模拟的兴趣与日俱增,最近也开发出了高效的新工具。当试图更好地理解不同相互作用的进化力量在DNA上留下的痕迹时,能够在复杂进化模型下高效模拟大DNA片段的软件工具将非常有帮助。在此,我将介绍GenomePop,一个满足上述要求的正向模拟程序。通过研究密码子内重组对全局和位点特异性dN/dS估计的影响来演示该程序的使用。
我开发了算法并编写了软件,以便及时有效地正向模拟不同的马尔可夫核苷酸或密码子DNA突变模型。这些模型可以与密码子间和密码子内水平的重组、基于适应度的选择以及复杂的种群统计学情景相结合。
GenomePop在复杂的进化和种群统计学模型下模拟单核苷酸多态性(SNP)或DNA序列方面具有许多有趣的特性。这些特性使其相对于其他模拟工具具有独特性。具体而言,它能够在通用时间可逆(GTR)突变或带有密码子内重组的GTRxMG94密码子模型下进行正向模拟,具有任意的、用户定义的迁移模式、二倍体或单倍体模型、恒定或可变的种群大小等。它还允许在不同的突变效应分布下模拟基于适应度的选择。在双等位基因模型下,它可以模拟重组热点,定义不同种群中的不同频率等。GenomePop还可以处理大DNA片段。此外,在复杂的种群统计学和进化情况下模拟大序列和种群大小时,它有一个缩放选项以节省计算时间。其网页[1]中详细介绍了这些以及许多其他特性。