Suppr超能文献

人口统计学参数的复合似然估计

Composite likelihood estimation of demographic parameters.

作者信息

Garrigan Daniel

机构信息

Department of Biology, University of Rochester, Rochester, New York, USA.

出版信息

BMC Genet. 2009 Nov 12;10:72. doi: 10.1186/1471-2156-10-72.

Abstract

BACKGROUND

Most existing likelihood-based methods for fitting historical demographic models to DNA sequence polymorphism data to do not scale feasibly up to the level of whole-genome data sets. Computational economies can be achieved by incorporating two forms of pseudo-likelihood: composite and approximate likelihood methods. Composite likelihood enables scaling up to large data sets because it takes the product of marginal likelihoods as an estimator of the likelihood of the complete data set. This approach is especially useful when a large number of genomic regions constitutes the data set. Additionally, approximate likelihood methods can reduce the dimensionality of the data by summarizing the information in the original data by either a sufficient statistic, or a set of statistics. Both composite and approximate likelihood methods hold promise for analyzing large data sets or for use in situations where the underlying demographic model is complex and has many parameters. This paper considers a simple demographic model of allopatric divergence between two populations, in which one of the population is hypothesized to have experienced a founder event, or population bottleneck. A large resequencing data set from human populations is summarized by the joint frequency spectrum, which is a matrix of the genomic frequency spectrum of derived base frequencies in two populations. A Bayesian Metropolis-coupled Markov chain Monte Carlo (MCMCMC) method for parameter estimation is developed that uses both composite and likelihood methods and is applied to the three different pairwise combinations of the human population resequence data. The accuracy of the method is also tested on data sets sampled from a simulated population model with known parameters.

RESULTS

The Bayesian MCMCMC method also estimates the ratio of effective population size for the X chromosome versus that of the autosomes. The method is shown to estimate, with reasonable accuracy, demographic parameters from three simulated data sets that vary in the magnitude of a founder event and a skew in the effective population size of the X chromosome relative to the autosomes. The behavior of the Markov chain is also examined and shown to convergence to its stationary distribution, while also showing high levels of parameter mixing. The analysis of three pairwise comparisons of sub-Saharan African human populations with non-African human populations do not provide unequivocal support for a strong non-African founder event from these nuclear data. The estimates do however suggest a skew in the ratio of X chromosome to autosome effective population size that is greater than one. However in all three cases, the 95% highest posterior density interval for this ratio does include three-fourths, the value expected under an equal breeding sex ratio.

CONCLUSION

The implementation of composite and approximate likelihood methods in a framework that includes MCMCMC demographic parameter estimation shows great promise for being flexible and computationally efficient enough to scale up to the level of whole-genome polymorphism and divergence analysis. Further work must be done to characterize the effects of the assumption of linkage equilibrium among genomic regions that is crucial to the validity of applying the composite likelihood method.

摘要

背景

大多数现有的基于似然性的方法,用于将历史人口模型拟合到DNA序列多态性数据,在处理全基因组数据集时难以实现可行的扩展。通过纳入两种形式的伪似然性(复合似然性和近似似然性方法)可以实现计算经济性。复合似然性能够扩展到大型数据集,因为它将边际似然性的乘积作为完整数据集似然性的估计值。当大量基因组区域构成数据集时,这种方法特别有用。此外,近似似然性方法可以通过充分统计量或一组统计量总结原始数据中的信息,从而降低数据的维度。复合似然性和近似似然性方法在分析大型数据集或用于潜在人口模型复杂且参数众多的情况时都具有前景。本文考虑了两个种群间异域分化的简单人口模型,其中假设一个种群经历了奠基者事件或种群瓶颈。来自人类种群的大型重测序数据集通过联合频率谱进行总结,联合频率谱是两个种群中衍生碱基频率的基因组频率谱矩阵。开发了一种用于参数估计的贝叶斯Metropolis耦合马尔可夫链蒙特卡罗(MCMCMC)方法,该方法同时使用复合似然性和似然性方法,并应用于人类种群重测序数据的三种不同成对组合。该方法的准确性也在从具有已知参数的模拟种群模型中采样的数据集上进行了测试。

结果

贝叶斯MCMCMC方法还估计了X染色体与常染色体的有效种群大小之比。结果表明,该方法能够以合理的准确性从三个模拟数据集中估计人口参数,这三个模拟数据集在奠基者事件的大小以及X染色体相对于常染色体的有效种群大小偏差方面有所不同。还检查了马尔可夫链的行为,结果表明它收敛到其平稳分布,同时还显示出高水平的参数混合。对撒哈拉以南非洲人类种群与非非洲人类种群的三个成对比较分析,并未从这些核数据中为强烈的非非洲奠基者事件提供明确支持。然而,估计结果确实表明X染色体与常染色体有效种群大小之比存在偏差,该偏差大于1。然而,在所有三种情况下,该比例的95%最高后验密度区间确实包括四分之三,即性别比例相等时预期的值。

结论

在包含MCMCMC人口参数估计的框架中实施复合似然性和近似似然性方法,对于灵活且计算高效地扩展到全基因组多态性和分化分析水平显示出巨大的前景。必须进一步开展工作,以表征基因组区域间连锁平衡假设的影响,这对于应用复合似然性方法的有效性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af89/2783031/e3a9eaeb4205/1471-2156-10-72-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验