Krukov Ivan, de Sanctis Bianca, de Koning A P Jason
Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 1N4, Canada.
Doctoral Program in Biochemistry and Molecular Biology, Bioinformatics Stream, University of Calgary, Calgary, Alberta T2N 1N4, Canada.
Bioinformatics. 2017 May 1;33(9):1416-1417. doi: 10.1093/bioinformatics/btw802.
The simplifying assumptions that are used widely in theoretical population genetics may not always be appropriate for empirical population genetics. General computational approaches that do not require the assumptions of classical theory are therefore quite desirable. One such general approach is provided by the theory of absorbing Markov chains, which can be used to obtain exact results by directly analyzing population genetic Markov models, such as the classic bi-allelic Wright-Fisher model. Although these approaches are sometimes used, they are usually forgone in favor of simulation methods, due to the perception that they are too computationally burdensome. Here we show that, surprisingly, direct analysis of virtually any Markov chain model in population genetics can be made quite efficient by exploiting transition matrix sparsity and by solving restricted systems of linear equations, allowing a wide variety of exact calculations (within machine precision) to be easily and rapidly made on modern workstation computers.
We introduce Wright-Fisher Exact Solver (WFES), a fast and scalable method for direct analysis of Markov chain models in population genetics. WFES can rapidly solve for both long-term and transient behaviours including fixation and extinction probabilities, expected times to fixation or extinction, sojourn times, expected allele age and variance, and others. Our implementation requires only seconds to minutes of runtime on modern workstations and scales to biological population sizes ranging from humans to model organisms.
The code is available at https://github.com/dekoning-lab/wfes.
Supplementary data are available at Bioinformatics online.
理论群体遗传学中广泛使用的简化假设可能并不总是适用于实证群体遗传学。因此,非常需要不依赖经典理论假设的通用计算方法。吸收马尔可夫链理论提供了这样一种通用方法,它可通过直接分析群体遗传马尔可夫模型(如经典的双等位基因赖特 - 费希尔模型)来获得精确结果。尽管有时会使用这些方法,但由于人们认为它们计算量太大,所以通常会放弃而采用模拟方法。在这里我们表明,令人惊讶的是,通过利用转移矩阵的稀疏性并求解受限线性方程组,可以使群体遗传学中几乎任何马尔可夫链模型的直接分析变得非常高效,从而能够在现代工作站计算机上轻松快速地进行各种精确计算(在机器精度范围内)。
我们引入了赖特 - 费希尔精确求解器(WFES),这是一种用于直接分析群体遗传学中马尔可夫链模型的快速且可扩展的方法。WFES 可以快速求解长期和瞬态行为,包括固定和灭绝概率、固定或灭绝的预期时间、停留时间、预期等位基因年龄和方差等。我们的实现方法在现代工作站上只需几秒到几分钟的运行时间,并且可扩展到从人类到模式生物的生物群体规模。
代码可在 https://github.com/dekoning-lab/wfes 上获取。
补充数据可在《生物信息学》在线获取。