Verity Robert, Nichols Richard A
Medical Research Council Centre for Outbreak Analysis and Modelling, Imperial College London, London W2 1PG, United Kingdom
Queen Mary University of London, London E1 4NS, United Kingdom.
Genetics. 2016 Aug;203(4):1827-39. doi: 10.1534/genetics.115.180992. Epub 2016 Jun 17.
A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favor of a particular value of K cannot usually be computed exactly, and instead programs such as Structure make use of heuristic estimators to approximate this quantity. We show-using simulated data sets small enough that the true evidence can be computed exactly-that these heuristics often fail to estimate the true evidence and that this can lead to incorrect conclusions about K Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach, using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are more accurate and precise than those based on heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Finally, we test our solution in a reanalysis of a white-footed mouse data set. The TI methodology is implemented for models both with and without admixture in the software MavericK1.0.
在结构化种群分析中的一个关键量是参数K,它描述了构成总种群的亚种群数量。理想情况下,K的推断是通过模型证据进行的,模型证据等同于模型的似然性。然而,支持特定K值的证据通常无法精确计算,相反,诸如Structure等程序利用启发式估计器来近似这个量。我们使用足够小的模拟数据集(以便能够精确计算真实证据)表明,这些启发式方法常常无法估计真实证据,这可能导致关于K的错误结论。我们提出的解决方案是使用热力学积分(TI)来估计模型证据。在概述TI方法之后,我们使用一系列模拟数据集证明了该方法的有效性。我们发现TI可用于获得比基于启发式方法更准确和精确的模型证据估计值。此外,基于这些值的K估计值比基于一组模型比较统计量的估计值更可靠。最后,我们在对白足鼠数据集的重新分析中测试了我们的解决方案。TI方法在MavericK1.0软件中针对有混合和无混合的模型都进行了实现。