Palamara Pier Francesco
Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
Bioinformatics. 2016 Oct 1;32(19):3032-4. doi: 10.1093/bioinformatics/btw355. Epub 2016 Jun 16.
Simulation under the coalescent model is ubiquitous in the analysis of genetic data. The rapid growth of real data sets from multiple human populations led to increasing interest in simulating very large sample sizes at whole-chromosome scales. When the sample size is large, the coalescent model becomes an increasingly inaccurate approximation of the discrete time Wright-Fisher model (DTWF). Analytical and computational treatment of the DTWF, however, is generally harder.
We present a simulator (ARGON) for the DTWF process that scales up to hundreds of thousands of samples and whole-chromosome lengths, with a time/memory performance comparable or superior to currently available methods for coalescent simulation. The simulator supports arbitrary demographic history, migration, Newick tree output, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent sharing data.
ARGON (version 0.1) is written in Java, open source, and freely available at https://github.com/pierpal/ARGON CONTACT: ppalama@hsph.harvard.edu
Supplementary data are available at Bioinformatics online.
在遗传数据分析中,基于溯祖模型的模拟无处不在。来自多个人类群体的真实数据集的快速增长,使得人们对在全染色体尺度上模拟非常大的样本量越来越感兴趣。当样本量很大时,溯祖模型对离散时间的赖特-费希尔模型(DTWF)的近似变得越来越不准确。然而,DTWF的分析和计算处理通常更困难。
我们提出了一种用于DTWF过程的模拟器(ARGON),它可以扩展到数十万样本和全染色体长度,其时间/内存性能与目前可用的溯祖模拟方法相当或更优。该模拟器支持任意种群历史、迁移、Newick树输出、可变突变/重组率和基因转换,并能高效输出逐对同源共享数据。
ARGON(版本0.1)用Java编写,开源,可在https://github.com/pierpal/ARGON上免费获取。联系方式:ppalama@hsph.harvard.edu
补充数据可在《生物信息学》在线获取。