人口统计学参数的复合似然估计

Composite likelihood estimation of demographic parameters.

作者信息

Garrigan Daniel

机构信息

Department of Biology, University of Rochester, Rochester, New York, USA.

出版信息

BMC Genet. 2009 Nov 12;10:72. doi: 10.1186/1471-2156-10-72.

DOI:10.1186/1471-2156-10-72

PMID:19909534

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2783031/

Abstract

BACKGROUND

Most existing likelihood-based methods for fitting historical demographic models to DNA sequence polymorphism data to do not scale feasibly up to the level of whole-genome data sets. Computational economies can be achieved by incorporating two forms of pseudo-likelihood: composite and approximate likelihood methods. Composite likelihood enables scaling up to large data sets because it takes the product of marginal likelihoods as an estimator of the likelihood of the complete data set. This approach is especially useful when a large number of genomic regions constitutes the data set. Additionally, approximate likelihood methods can reduce the dimensionality of the data by summarizing the information in the original data by either a sufficient statistic, or a set of statistics. Both composite and approximate likelihood methods hold promise for analyzing large data sets or for use in situations where the underlying demographic model is complex and has many parameters. This paper considers a simple demographic model of allopatric divergence between two populations, in which one of the population is hypothesized to have experienced a founder event, or population bottleneck. A large resequencing data set from human populations is summarized by the joint frequency spectrum, which is a matrix of the genomic frequency spectrum of derived base frequencies in two populations. A Bayesian Metropolis-coupled Markov chain Monte Carlo (MCMCMC) method for parameter estimation is developed that uses both composite and likelihood methods and is applied to the three different pairwise combinations of the human population resequence data. The accuracy of the method is also tested on data sets sampled from a simulated population model with known parameters.

RESULTS

The Bayesian MCMCMC method also estimates the ratio of effective population size for the X chromosome versus that of the autosomes. The method is shown to estimate, with reasonable accuracy, demographic parameters from three simulated data sets that vary in the magnitude of a founder event and a skew in the effective population size of the X chromosome relative to the autosomes. The behavior of the Markov chain is also examined and shown to convergence to its stationary distribution, while also showing high levels of parameter mixing. The analysis of three pairwise comparisons of sub-Saharan African human populations with non-African human populations do not provide unequivocal support for a strong non-African founder event from these nuclear data. The estimates do however suggest a skew in the ratio of X chromosome to autosome effective population size that is greater than one. However in all three cases, the 95% highest posterior density interval for this ratio does include three-fourths, the value expected under an equal breeding sex ratio.

CONCLUSION

The implementation of composite and approximate likelihood methods in a framework that includes MCMCMC demographic parameter estimation shows great promise for being flexible and computationally efficient enough to scale up to the level of whole-genome polymorphism and divergence analysis. Further work must be done to characterize the effects of the assumption of linkage equilibrium among genomic regions that is crucial to the validity of applying the composite likelihood method.

摘要

背景

大多数现有的基于似然性的方法，用于将历史人口模型拟合到DNA序列多态性数据，在处理全基因组数据集时难以实现可行的扩展。通过纳入两种形式的伪似然性（复合似然性和近似似然性方法）可以实现计算经济性。复合似然性能够扩展到大型数据集，因为它将边际似然性的乘积作为完整数据集似然性的估计值。当大量基因组区域构成数据集时，这种方法特别有用。此外，近似似然性方法可以通过充分统计量或一组统计量总结原始数据中的信息，从而降低数据的维度。复合似然性和近似似然性方法在分析大型数据集或用于潜在人口模型复杂且参数众多的情况时都具有前景。本文考虑了两个种群间异域分化的简单人口模型，其中假设一个种群经历了奠基者事件或种群瓶颈。来自人类种群的大型重测序数据集通过联合频率谱进行总结，联合频率谱是两个种群中衍生碱基频率的基因组频率谱矩阵。开发了一种用于参数估计的贝叶斯Metropolis耦合马尔可夫链蒙特卡罗（MCMCMC）方法，该方法同时使用复合似然性和似然性方法，并应用于人类种群重测序数据的三种不同成对组合。该方法的准确性也在从具有已知参数的模拟种群模型中采样的数据集上进行了测试。

结果

贝叶斯MCMCMC方法还估计了X染色体与常染色体的有效种群大小之比。结果表明，该方法能够以合理的准确性从三个模拟数据集中估计人口参数，这三个模拟数据集在奠基者事件的大小以及X染色体相对于常染色体的有效种群大小偏差方面有所不同。还检查了马尔可夫链的行为，结果表明它收敛到其平稳分布，同时还显示出高水平的参数混合。对撒哈拉以南非洲人类种群与非非洲人类种群的三个成对比较分析，并未从这些核数据中为强烈的非非洲奠基者事件提供明确支持。然而，估计结果确实表明X染色体与常染色体有效种群大小之比存在偏差，该偏差大于1。然而，在所有三种情况下，该比例的95%最高后验密度区间确实包括四分之三，即性别比例相等时预期的值。

结论

在包含MCMCMC人口参数估计的框架中实施复合似然性和近似似然性方法，对于灵活且计算高效地扩展到全基因组多态性和分化分析水平显示出巨大的前景。必须进一步开展工作，以表征基因组区域间连锁平衡假设的影响，这对于应用复合似然性方法的有效性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af89/2783031/e3a9eaeb4205/1471-2156-10-72-1.jpg

相似文献

Composite likelihood estimation of demographic parameters.人口统计学参数的复合似然估计

BMC Genet. 2009 Nov 12;10:72. doi: 10.1186/1471-2156-10-72.

Computationally Efficient Composite Likelihood Statistics for Demographic Inference.用于人口统计学推断的计算高效的复合似然统计量

Mol Biol Evol. 2016 Feb;33(2):591-3. doi: 10.1093/molbev/msv255. Epub 2015 Nov 5.

Consistency of estimators of population scaled parameters using composite likelihood.使用复合似然估计总体尺度参数的估计量的一致性。

J Math Biol. 2006 Nov;53(5):821-41. doi: 10.1007/s00285-006-0031-0. Epub 2006 Sep 8.

Resampling: An improvement of importance sampling in varying population size models.重采样：可变种群规模模型中重要性采样的一种改进。

Theor Popul Biol. 2017 Apr;114:70-87. doi: 10.1016/j.tpb.2016.09.002. Epub 2016 Oct 3.

A fast and reliable computational method for estimating population genetic parameters.一种用于估计群体遗传参数的快速且可靠的计算方法。

Genetics. 2008 Jun;179(2):951-63. doi: 10.1534/genetics.108.087049. Epub 2008 May 27.

AABC: approximate approximate Bayesian computation for inference in population-genetic models.AABC：用于群体遗传模型推断的近似近似贝叶斯计算

Theor Popul Biol. 2015 Feb;99:31-42. doi: 10.1016/j.tpb.2014.09.002. Epub 2014 Sep 26.

Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.利用近似贝叶斯计算从大规模群体基因组数据中估计人口统计学参数。

BMC Genet. 2012 Mar 27;13:22. doi: 10.1186/1471-2156-13-22.

Robust demographic inference from genomic and SNP data.基于基因组和单核苷酸多态性数据的可靠人口统计学推断。

PLoS Genet. 2013 Oct;9(10):e1003905. doi: 10.1371/journal.pgen.1003905. Epub 2013 Oct 24.

Comparison of Bayesian and maximum-likelihood inference of population genetic parameters.群体遗传参数的贝叶斯推断与最大似然推断比较

Bioinformatics. 2006 Feb 1;22(3):341-5. doi: 10.1093/bioinformatics/bti803. Epub 2005 Nov 29.

Inferring past demographic changes from contemporary genetic data: A simulation-based evaluation of the ABC methods implemented in diyabc.从当代遗传数据推断过去的人口变化：基于模拟对 diyabc 中实现的 ABC 方法的评估。

Mol Ecol Resour. 2017 Nov;17(6):e94-e110. doi: 10.1111/1755-0998.12696. Epub 2017 Jul 25.

引用本文的文献

Inference and applications of ancestral recombination graphs.祖先重组图的推断与应用

Nat Rev Genet. 2025 Jan;26(1):47-58. doi: 10.1038/s41576-024-00772-4. Epub 2024 Sep 30.

Methods for Estimating Demography and Detecting Between-Locus Differences in the Effective Population Size and Mutation Rate.估计人口统计学和检测有效种群大小和突变率在基因座间差异的方法。

Mol Biol Evol. 2019 Feb 1;36(2):423-433. doi: 10.1093/molbev/msy212.

Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference.利用近似贝叶斯计算对 SNP 芯片数据进行抽样建模，以进行人口推断。

Sci Rep. 2018 Jul 5;8(1):10209. doi: 10.1038/s41598-018-28539-y.

Robust demographic inference from genomic and SNP data.基于基因组和单核苷酸多态性数据的可靠人口统计学推断。

PLoS Genet. 2013 Oct;9(10):e1003905. doi: 10.1371/journal.pgen.1003905. Epub 2013 Oct 24.

Why to account for finite sites in population genetic studies and how to do this with Jaatha 2.0.为什么要在群体遗传研究中考虑有限的位点，以及如何使用 Jaatha 2.0 来实现这一点。

Ecol Evol. 2013 Oct;3(11):3647-62. doi: 10.1002/ece3.722. Epub 2013 Sep 4.

Paleopopulation genetics.古人口群体遗传学。

Annu Rev Genet. 2012;46:635-49. doi: 10.1146/annurev-genet-110711-155557. Epub 2012 Sep 17.

Estimating population divergence time and phylogeny from single-nucleotide polymorphisms data with outgroup ascertainment bias.基于外群确认偏倚的单核苷酸多态性数据估计群体分歧时间和系统发育。

Mol Ecol. 2012 Feb;21(4):974-86. doi: 10.1111/j.1365-294X.2011.05413.x. Epub 2011 Dec 29.

Estimating parameters of speciation models based on refined summaries of the joint site-frequency spectrum.基于联合位点频率谱的精细化汇总估计物种形成模型的参数。

PLoS One. 2011;6(5):e18155. doi: 10.1371/journal.pone.0018155. Epub 2011 May 26.

本文引用的文献

PERSPECTIVE: HIGHLY VARIABLE LOCI AND THEIR INTERPRETATION IN EVOLUTION AND CONSERVATION.视角：高度可变位点及其在进化与保护中的解读

Evolution. 1999 Apr;53(2):313-318. doi: 10.1111/j.1558-5646.1999.tb03767.x.

Learning about modes of speciation by computational approaches.通过计算方法了解物种形成模式。

Evolution. 2009 Oct;63(10):2547-62. doi: 10.1111/j.1558-5646.2009.00662.x. Epub 2009 Feb 18.

Accelerated genetic drift on chromosome X during the human dispersal out of Africa.人类走出非洲过程中X染色体上加速的基因漂变。

Nat Genet. 2009 Jan;41(1):66-70. doi: 10.1038/ng.303. Epub 2008 Dec 21.

Sex-biased evolutionary forces shape genomic patterns of human diversity.性别偏向的进化力量塑造了人类多样性的基因组模式。

PLoS Genet. 2008 Sep 26;4(9):e1000202. doi: 10.1371/journal.pgen.1000202.

A novel DNA sequence database for analyzing human demographic history.一个用于分析人类人口历史的新型DNA序列数据库。

Genome Res. 2008 Aug;18(8):1354-61. doi: 10.1101/gr.075630.107. Epub 2008 May 20.

Population size changes reshape genomic patterns of diversity.种群大小的变化重塑了多样性的基因组模式。

Evolution. 2007 Dec;61(12):3001-6. doi: 10.1111/j.1558-5646.2007.00238.x. Epub 2007 Oct 30.

Demographic histories and patterns of linkage disequilibrium in Chinese and Indian rhesus macaques.中国和印度恒河猴的人口统计学历史及连锁不平衡模式。

Science. 2007 Apr 13;316(5822):240-3. doi: 10.1126/science.1140462.

Inferring the demographic history and rate of adaptive substitution in Drosophila.推断果蝇的种群历史和适应性替代率。

PLoS Genet. 2006 Oct 13;2(10):e166. doi: 10.1371/journal.pgen.0020166. Epub 2006 Aug 17.

Consistency of estimators of population scaled parameters using composite likelihood.使用复合似然估计总体尺度参数的估计量的一致性。

J Math Biol. 2006 Nov;53(5):821-41. doi: 10.1007/s00285-006-0031-0. Epub 2006 Sep 8.

Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes.在全重测序数据集中探究变异的多个方面以推断人类种群大小变化。

Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18508-13. doi: 10.1073/pnas.0507325102. Epub 2005 Dec 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人口统计学参数的复合似然估计

Composite likelihood estimation of demographic parameters.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献