Shlyakhter Ilya, Sabeti Pardis C, Schaffner Stephen F
Broad Institute of MIT and Harvard, MA 02142 and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA Broad Institute of MIT and Harvard, MA 02142 and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
Broad Institute of MIT and Harvard, MA 02142 and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
Bioinformatics. 2014 Dec 1;30(23):3427-9. doi: 10.1093/bioinformatics/btu562. Epub 2014 Aug 22.
Efficient simulation of population genetic samples under a given demographic model is a prerequisite for many analyses. Coalescent theory provides an efficient framework for such simulations, but simulating longer regions and higher recombination rates remains challenging. Simulators based on a Markovian approximation to the coalescent scale well, but do not support simulation of selection. Gene conversion is not supported by any published coalescent simulators that support selection.
We describe cosi2, an efficient simulator that supports both exact and approximate coalescent simulation with positive selection. cosi2 improves on the speed of existing exact simulators, and permits further speedup in approximate mode while retaining support for selection. cosi2 supports a wide range of demographic scenarios, including recombination hot spots, gene conversion, population size changes, population structure and migration. cosi2 implements coalescent machinery efficiently by tracking only a small subset of the Ancestral Recombination Graph, sampling only relevant recombination events, and using augmented skip lists to represent tracked genetic segments. To preserve support for selection in approximate mode, the Markov approximation is implemented not by moving along the chromosome but by performing a standard backwards-in-time coalescent simulation while restricting coalescence to node pairs with overlapping or near-overlapping genetic material. We describe the algorithms used by cosi2 and present comparisons with existing selection simulators.
A free C++ implementation of cosi2 is available at http://broadinstitute.org/mpg/cosi2.
在给定的人口统计模型下对群体遗传样本进行高效模拟是许多分析的前提条件。合并理论为此类模拟提供了一个有效的框架,但模拟更长区域和更高重组率仍然具有挑战性。基于合并的马尔可夫近似的模拟器扩展性良好,但不支持选择模拟。任何支持选择的已发表的合并模拟器都不支持基因转换。
我们描述了cosi2,这是一个高效的模拟器,支持带有正选择的精确和近似合并模拟。cosi2提高了现有精确模拟器的速度,并在近似模式下允许进一步加速,同时保留对选择的支持。cosi2支持广泛的人口统计场景,包括重组热点、基因转换、种群大小变化、种群结构和迁移。cosi2通过仅跟踪祖先重组图的一小部分子集、仅对相关重组事件进行采样以及使用增强跳表来表示跟踪的遗传片段,有效地实现了合并机制。为了在近似模式下保留对选择的支持,马尔可夫近似不是通过沿着染色体移动来实现,而是通过执行标准的时间反向合并模拟,同时将合并限制在具有重叠或近乎重叠遗传物质的节点对。我们描述了cosi2使用的算法,并与现有的选择模拟器进行了比较。