Center for Computational Biology, University of California, Berkeley, CA 94720, United States of America.
Department of Statistics, University of California, Berkeley, CA 94720, United States of America; Computer Science Division, University of California, Berkeley, CA 94720, United States of America; Chan Zuckerberg Biohub, San Francisco, CA 94158, United States of America.
Theor Popul Biol. 2021 Oct;141:34-43. doi: 10.1016/j.tpb.2021.06.003. Epub 2021 Jun 26.
The ancestral recombination graph (ARG) contains the full genealogical information of the sample, and many population genetic inference problems can be solved using inferred or sampled ARGs. In particular, the waiting distance between tree changes along the genome can be used to make inference about the distribution and evolution of recombination rates. To this end, we here derive an analytic expression for the distribution of waiting distances between tree changes under the sequentially Markovian coalescent model and obtain an accurate approximation to the distribution of waiting distances for topology changes. We use these results to show that some of the recently proposed methods for inferring sequences of trees along the genome provide strongly biased distributions of waiting distances. In addition, we provide a correction to an undercounting problem facing all available ARG inference methods, thereby facilitating the use of ARG inference methods to estimate temporal changes in the recombination rate.
祖先重组图(ARG)包含了样本的完整谱系信息,并且可以使用推断或采样的 ARG 来解决许多群体遗传推断问题。特别是,沿着基因组的树变化之间的等待距离可用于推断重组率的分布和演化。为此,我们在此推导出了顺序马尔可夫凝聚模型下树变化之间等待距离分布的解析表达式,并获得了拓扑变化等待距离分布的精确逼近。我们使用这些结果表明,最近提出的一些用于推断基因组中树序列的方法提供了强烈偏向的等待距离分布。此外,我们还解决了所有可用的 ARG 推断方法都面临的一个计数不足问题,从而促进了使用 ARG 推断方法来估计重组率的时间变化。