Hobolth Asger, Uyenoyama Marcy K, Wiuf Carsten
Aarhus University.
Stat Appl Genet Mol Biol. 2008;7(1):Article32. doi: 10.2202/1544-6115.1400. Epub 2008 Oct 30.
Importance sampling or Markov Chain Monte Carlo sampling is required for state-of-the-art statistical analysis of population genetics data. The applicability of these sampling-based inference techniques depends crucially on the proposal distribution. In this paper, we discuss importance sampling for the infinite sites model. The infinite sites assumption is attractive because it constraints the number of possible genealogies, thereby allowing for the analysis of larger data sets. We recall the Griffiths-Tavaré and Stephens-Donnelly proposals and emphasize the relation between the latter proposal and exact sampling from the infinite alleles model. We also introduce a new proposal that takes knowledge of the ancestral state into account. The new proposal is derived from a new result on exact sampling from a single site. The methods are illustrated on simulated data sets and the data considered in Griffiths and Tavaré (1994).
对于群体遗传学数据的前沿统计分析,需要重要性抽样或马尔可夫链蒙特卡罗抽样。这些基于抽样的推断技术的适用性关键取决于提议分布。在本文中,我们讨论无限位点模型的重要性抽样。无限位点假设很有吸引力,因为它限制了可能的系谱数量,从而允许分析更大的数据集。我们回顾了格里菲斯 - 塔瓦雷和斯蒂芬斯 - 唐纳利提议,并强调了后者提议与从无限等位基因模型进行精确抽样之间的关系。我们还引入了一种考虑祖先状态知识的新提议。该新提议源自关于单个位点精确抽样的一个新结果。这些方法在模拟数据集以及格里菲斯和塔瓦雷(1994)中所考虑的数据上进行了说明。