Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.
BMC Genet. 2012 Mar 27;13:22. doi: 10.1186/1471-2156-13-22.
The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.
We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.
We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.
近似贝叶斯计算(ABC)方法已被用于推断许多物种的人口参数,包括人类。然而,与近年来获得的大量全基因组种群遗传数据相比,大多数 ABC 应用仍然使用少量数据和少数几个基因座。
我们通过模拟来自已知人口模型的数据,评估了 ABC 方法在三种“种群分歧”模型中的性能 - 类似于“隔离与迁移”模型 - 当数据由数百个通过对多个个体进行分型的 SNPs 组成时。ABC 方法用于推断感兴趣的人口参数,我们将推断出的数值与用于生成假设“观察”数据的真实参数值进行比较。对于所有三种案例模型,ABC 方法都很好地推断了大多数人口参数,具有较窄的置信区间,例如种群分歧时间和过去的种群规模,但有些参数更难以推断,例如目前的种群规模和迁移率。我们比较了不同摘要统计量推断人口参数的能力,包括基于单倍型和 LD 的统计量,并发现通过组合捕获数据中不同部分信息的摘要统计量,可以提高参数估计的准确性。此外,我们的结果表明,在某些情况下,ABC 可以检测到先验分布的不良选择。最后,通过增加超过数百个基因座的数量,将大大提高使用 ABC 进行许多参数估计的准确性。
我们得出结论,ABC 方法可以适应现实的全基因组种群遗传数据,这可能难以用完整的似然方法进行分析,并且 ABC 可以从这些数据中提供人口参数的准确和精确推断,这表明 ABC 方法将是分析大型全基因组数据集的有用工具。