使用精确贝塔近似在赖特-费希尔模型下进行推断。

Inference Under a Wright-Fisher Model Using an Accurate Beta Approximation.

作者信息

Tataru Paula, Bataillon Thomas, Hobolth Asger

机构信息

Bioinformatics Research Centre, Aarhus University, Aarhus C 8000, Denmark

Bioinformatics Research Centre, Aarhus University, Aarhus C 8000, Denmark.

出版信息

Genetics. 2015 Nov;201(3):1133-41. doi: 10.1534/genetics.115.179606. Epub 2015 Aug 26.

DOI:10.1534/genetics.115.179606

PMID:26311474

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4649640/

Abstract

The large amount and high quality of genomic data available today enable, in principle, accurate inference of evolutionary histories of observed populations. The Wright-Fisher model is one of the most widely used models for this purpose. It describes the stochastic behavior in time of allele frequencies and the influence of evolutionary pressures, such as mutation and selection. Despite its simple mathematical formulation, exact results for the distribution of allele frequency (DAF) as a function of time are not available in closed analytical form. Existing approximations build on the computationally intensive diffusion limit or rely on matching moments of the DAF. One of the moment-based approximations relies on the beta distribution, which can accurately describe the DAF when the allele frequency is not close to the boundaries (0 and 1). Nonetheless, under a Wright-Fisher model, the probability of being on the boundary can be positive, corresponding to the allele being either lost or fixed. Here we introduce the beta with spikes, an extension of the beta approximation that explicitly models the loss and fixation probabilities as two spikes at the boundaries. We show that the addition of spikes greatly improves the quality of the approximation. We additionally illustrate, using both simulated and real data, how the beta with spikes can be used for inference of divergence times between populations with comparable performance to an existing state-of-the-art method.

摘要

如今可用的大量高质量基因组数据原则上能够精确推断观察到的种群的进化历史。赖特 - 费希尔模型是为此目的最广泛使用的模型之一。它描述了等位基因频率随时间的随机行为以及进化压力（如突变和选择）的影响。尽管其数学公式简单，但等位基因频率分布（DAF）作为时间函数的精确结果无法以封闭的解析形式获得。现有的近似方法基于计算密集的扩散极限构建，或者依赖于匹配DAF的矩。基于矩的近似方法之一依赖于贝塔分布，当等位基因频率不接近边界（0和1）时，它可以准确描述DAF。然而，在赖特 - 费希尔模型下，处于边界的概率可能为正，对应于等位基因要么丢失要么固定。在这里，我们引入带尖峰的贝塔分布，这是贝塔近似的扩展，它将丢失和固定概率明确建模为边界处的两个尖峰。我们表明添加尖峰极大地提高了近似的质量。我们还使用模拟数据和真实数据说明了带尖峰的贝塔分布如何用于推断种群之间的分化时间，其性能与现有的最先进方法相当。

相似文献

Inference Under a Wright-Fisher Model Using an Accurate Beta Approximation.使用精确贝塔近似在赖特-费希尔模型下进行推断。

Genetics. 2015 Nov;201(3):1133-41. doi: 10.1534/genetics.115.179606. Epub 2015 Aug 26.

Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data.使用等位基因频率数据在赖特-费希尔模型中的统计推断。

Syst Biol. 2017 Jan 1;66(1):e30-e46. doi: 10.1093/sysbio/syw056.

Inference from the stationary distribution of allele frequencies in a family of Wright-Fisher models with two levels of genetic variability.从具有两级遗传变异性的赖特-费希尔模型家族中等位基因频率的平稳分布进行推断。

Theor Popul Biol. 2018 Jul;122:78-87. doi: 10.1016/j.tpb.2018.03.004. Epub 2018 Mar 21.

Inference of Selection from Genetic Time Series Using Various Parametric Approximations to the Wright-Fisher Model.利用各种参数逼近 Wright-Fisher 模型对遗传时间序列进行选择推断。

G3 (Bethesda). 2019 Dec 3;9(12):4073-4086. doi: 10.1534/g3.119.400778.

The multivariate Wright-Fisher process with mutation: Moment-based analysis and inference using a hierarchical Beta model.具有突变的多元赖特-费希尔过程：基于矩的分析以及使用分层贝塔模型的推断

Theor Popul Biol. 2016 Apr;108:36-50. doi: 10.1016/j.tpb.2015.11.001. Epub 2015 Nov 29.

Self-contained Beta-with-Spikes approximation for inference under a Wright-Fisher model.自包含带有尖峰的贝塔分布逼近在 Wright-Fisher 模型下的推断。

Genetics. 2023 Oct 4;225(2). doi: 10.1093/genetics/iyad092.

Exact results for the probability and stochastic dynamics of fixation in the Wright-Fisher model.赖特-费希尔模型中固定概率和随机动力学的精确结果。

J Theor Biol. 2017 Oct 7;430:64-77. doi: 10.1016/j.jtbi.2017.06.026. Epub 2017 Jun 22.

Exact simulation of conditioned Wright-Fisher models.条件Wright-Fisher模型的精确模拟。

J Theor Biol. 2014 Dec 21;363:419-26. doi: 10.1016/j.jtbi.2014.08.027. Epub 2014 Aug 28.

A simple, semi-deterministic approximation to the distribution of selective sweeps in large populations.一种针对大群体中选择性清除分布的简单半确定性近似方法。

Theor Popul Biol. 2015 May;101:40-6. doi: 10.1016/j.tpb.2015.01.004. Epub 2015 Feb 24.

Wright-Fisher diffusion bridges.赖特-费希尔扩散桥

Theor Popul Biol. 2018 Jul;122:67-77. doi: 10.1016/j.tpb.2017.09.005. Epub 2017 Oct 6.

引用本文的文献

Genetic drift acts strongly on within-host influenza virus populations during acute infection but does not act alone.在急性感染期间，基因漂变对宿主体内的流感病毒群体有强烈影响，但并非单独起作用。

bioRxiv. 2025 Aug 30:2025.08.27.672713. doi: 10.1101/2025.08.27.672713.

MalKinID: A classification model for identifying malaria parasite genealogical relationships using identity-by-descent.MalKinID：一种使用同源性识别疟原虫谱系关系的分类模型。

Genetics. 2025 Feb 5;229(2). doi: 10.1093/genetics/iyae197.

A path integral approach for allele frequency dynamics under polygenic selection.多基因选择下等位基因频率动态的路径积分方法。

Genetics. 2025 Jan 8;229(1):1-63. doi: 10.1093/genetics/iyae182.

MalKinID: A Likelihood-Based Model for Identifying Malaria Parasite Genealogical Relationships Using Identity-by-Descent.MalKinID：一种基于似然性的模型，用于通过同源性鉴定疟原虫谱系关系。

bioRxiv. 2024 Jul 16:2024.07.12.603328. doi: 10.1101/2024.07.12.603328.

A Path Integral Approach for Allele Frequency Dynamics Under Polygenic Selection.多基因选择下等位基因频率动态的路径积分方法。

bioRxiv. 2024 Jun 14:2024.06.14.599114. doi: 10.1101/2024.06.14.599114.

A mechanistic model of gossip, reputations, and cooperation.谣言、声誉和合作的机械模型。

Proc Natl Acad Sci U S A. 2024 May 14;121(20):e2400689121. doi: 10.1073/pnas.2400689121. Epub 2024 May 8.

Self-contained Beta-with-Spikes approximation for inference under a Wright-Fisher model.自包含带有尖峰的贝塔分布逼近在 Wright-Fisher 模型下的推断。

Genetics. 2023 Oct 4;225(2). doi: 10.1093/genetics/iyad092.

Direct detection of natural selection in Bronze Age Britain.直接检测青铜时代英国的自然选择。

Genome Res. 2022 Nov-Dec;32(11-12):2057-2067. doi: 10.1101/gr.276862.122. Epub 2022 Oct 31.

Inferring Epistasis from Genetic Time-series Data.从遗传时间序列数据推断上位性。

Mol Biol Evol. 2022 Oct 7;39(10). doi: 10.1093/molbev/msac199.

MPL resolves genetic linkage in fitness inference from complex evolutionary histories.MPL 解决了从复杂进化历史中推断适应性的遗传连锁问题。

Nat Biotechnol. 2021 Apr;39(4):472-479. doi: 10.1038/s41587-020-0737-3. Epub 2020 Nov 30.

本文引用的文献

Multi-locus analysis of genomic time series data from experimental evolution.来自实验进化的基因组时间序列数据的多位点分析。

PLoS Genet. 2015 Apr 7;11(4):e1005069. doi: 10.1371/journal.pgen.1005069. eCollection 2015 Apr.

Exploring population size changes using SNP frequency spectra.利用单核苷酸多态性频率谱探索种群大小变化。

Nat Genet. 2015 May;47(5):555-9. doi: 10.1038/ng.3254. Epub 2015 Apr 6.

Inference of purifying and positive selection in three subspecies of chimpanzees (Pan troglodytes) from exome sequencing.基于外显子组测序推断黑猩猩（Pan troglodytes）三个亚种的纯化选择和正选择

Genome Biol Evol. 2015 Mar 30;7(4):1122-32. doi: 10.1093/gbe/evv058.

Large-scale whole-genome sequencing of the Icelandic population.大规模全基因组测序的冰岛人口。

Nat Genet. 2015 May;47(5):435-44. doi: 10.1038/ng.3247. Epub 2015 Mar 25.

A NOVEL SPECTRAL METHOD FOR INFERRING GENERAL DIPLOID SELECTION FROM TIME SERIES GENETIC DATA.一种从时间序列遗传数据推断一般二倍体选择的新谱方法。

Ann Appl Stat. 2014 Dec;8(4):2203-2222. doi: 10.1214/14-aoas764.

Thinking too positive? Revisiting current methods of population genetic selection inference.思维过于积极？重新审视当前的群体遗传选择推断方法。

Trends Genet. 2014 Dec;30(12):540-6. doi: 10.1016/j.tig.2014.09.010. Epub 2014 Nov 19.

Comparative population genomics in animals uncovers the determinants of genetic diversity.动物比较群体基因组学揭示了遗传多样性的决定因素。

Nature. 2014 Nov 13;515(7526):261-3. doi: 10.1038/nature13685. Epub 2014 Aug 20.

Inferring human population size and separation history from multiple genome sequences.从多个基因组序列推断人类种群规模和分离历史。

Nat Genet. 2014 Aug;46(8):919-25. doi: 10.1038/ng.3015. Epub 2014 Jun 22.

WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data.WFABC：一种基于 Wright-Fisher ABC 的方法，可从时间采样数据推断有效种群大小和选择系数。

Mol Ecol Resour. 2015 Jan;15(1):87-98. doi: 10.1111/1755-0998.12280. Epub 2014 Jun 11.

Complete numerical solution of the diffusion equation of random genetic drift.随机遗传漂变扩散方程的完全数值解。

Genetics. 2013 Aug;194(4):973-85. doi: 10.1534/genetics.113.152017. Epub 2013 Jun 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。