Fung Tak, Keenan Kevin
National University of Singapore, Department of Biological Sciences, Singapore, Singapore ; Queen's University Belfast, School of Biological Sciences, Belfast, Northern Ireland, United Kingdom.
Queen's University Belfast, Institute for Global Food Security, School of Biological Sciences, Belfast, Northern Ireland, United Kingdom.
PLoS One. 2014 Jan 21;9(1):e85925. doi: 10.1371/journal.pone.0085925. eCollection 2014.
The estimation of population allele frequencies using sample data forms a central component of studies in population genetics. These estimates can be used to test hypotheses on the evolutionary processes governing changes in genetic variation among populations. However, existing studies frequently do not account for sampling uncertainty in these estimates, thus compromising their utility. Incorporation of this uncertainty has been hindered by the lack of a method for constructing confidence intervals containing the population allele frequencies, for the general case of sampling from a finite diploid population of any size. In this study, we address this important knowledge gap by presenting a rigorous mathematical method to construct such confidence intervals. For a range of scenarios, the method is used to demonstrate that for a particular allele, in order to obtain accurate estimates within 0.05 of the population allele frequency with high probability (> or = 95%), a sample size of > 30 is often required. This analysis is augmented by an application of the method to empirical sample allele frequency data for two populations of the checkerspot butterfly (Melitaea cinxia L.), occupying meadows in Finland. For each population, the method is used to derive > or = 98.3% confidence intervals for the population frequencies of three alleles. These intervals are then used to construct two joint > or = 95% confidence regions, one for the set of three frequencies for each population. These regions are then used to derive a > or = 95%% confidence interval for Jost's D, a measure of genetic differentiation between the two populations. Overall, the results demonstrate the practical utility of the method with respect to informing sampling design and accounting for sampling uncertainty in studies of population genetics, important for scientific hypothesis-testing and also for risk-based natural resource management.
利用样本数据估计群体等位基因频率是群体遗传学研究的核心内容。这些估计值可用于检验关于群体间遗传变异变化的进化过程的假设。然而,现有研究常常没有考虑这些估计值中的抽样不确定性,从而影响了它们的实用性。由于缺乏一种为任意大小的有限二倍体群体抽样的一般情况下构建包含群体等位基因频率的置信区间的方法,这种不确定性的纳入受到了阻碍。在本研究中,我们通过提出一种严格的数学方法来构建此类置信区间,解决了这一重要的知识空白。对于一系列情况,该方法被用于证明对于特定等位基因,为了以高概率(≥95%)在群体等位基因频率的0.05范围内获得准确估计值,通常需要大于30的样本量。通过将该方法应用于芬兰占据草地的两种眼蝶(Melitaea cinxia L.)群体的经验样本等位基因频率数据,对这一分析进行了补充。对于每个群体,该方法用于推导三个等位基因的群体频率的≥98.3%置信区间。然后,这些区间被用于构建两个联合的≥95%置信区域,一个针对每个群体的三个频率集合。然后,这些区域被用于推导Jost's D的≥95%置信区间,Jost's D是衡量两个群体间遗传分化的指标。总体而言,结果证明了该方法在为抽样设计提供信息以及在群体遗传学研究中考虑抽样不确定性方面的实际效用,这对于科学假设检验以及基于风险的自然资源管理都很重要。