Bunnefeld Lynsey, Frantz Laurent A F, Lohse Konrad
Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
Animal Breeding and Genomics Centre, Wageningen University, Wageningen 6708 PB, The Netherlands.
Genetics. 2015 Nov;201(3):1157-69. doi: 10.1534/genetics.115.179861. Epub 2015 Sep 3.
The advent of the genomic era has necessitated the development of methods capable of analyzing large volumes of genomic data efficiently. Being able to reliably identify bottlenecks--extreme population size changes of short duration--not only is interesting in the context of speciation and extinction but also matters (as a null model) when inferring selection. Bottlenecks can be detected in polymorphism data via their distorting effect on the shape of the underlying genealogy. Here, we use the generating function of genealogies to derive the probability of mutational configurations in short sequence blocks under a simple bottleneck model. Given a large number of nonrecombining blocks, we can compute maximum-likelihood estimates of the time and strength of the bottleneck. Our method relies on a simple summary of the joint distribution of polymorphic sites. We extend the site frequency spectrum by counting mutations in frequency classes in short sequence blocks. Using linkage information over short distances in this way gives greater power to detect bottlenecks than the site frequency spectrum and potentially opens up a wide range of demographic histories to blockwise inference. Finally, we apply our method to genomic data from a species of pig (Sus cebifrons) endemic to islands in the center and west of the Philippines to estimate whether a bottleneck occurred upon island colonization and compare our scheme to Li and Durbin's pairwise sequentially Markovian coalescent (PSMC) both for the pig data and using simulations.
基因组时代的到来使得有必要开发能够有效分析大量基因组数据的方法。能够可靠地识别瓶颈——短时间内种群大小的极端变化——不仅在物种形成和灭绝的背景下很有趣,而且在推断选择时(作为一个零模型)也很重要。瓶颈可以通过其对基础谱系形状的扭曲效应在多态性数据中被检测到。在这里,我们使用谱系的生成函数来推导在一个简单的瓶颈模型下短序列块中突变配置的概率。给定大量非重组块,我们可以计算瓶颈时间和强度的最大似然估计。我们的方法依赖于多态性位点联合分布的一个简单总结。我们通过计算短序列块中频率类别的突变来扩展位点频率谱。以这种方式使用短距离的连锁信息比位点频率谱具有更强的检测瓶颈的能力,并有可能为逐块推断开辟广泛的种群历史。最后,我们将我们的方法应用于来自菲律宾中部和西部岛屿特有的一种猪(Sus cebifrons)的基因组数据,以估计在岛屿殖民化时是否发生了瓶颈,并将我们的方案与李和德宾的成对顺序马尔可夫合并(PSMC)方法在猪数据和模拟数据上进行比较。