从基因组测序数据中推断种群规模历史。

Robust inference of population size histories from genomic sequencing data.

机构信息

Department of Physics, University of Chicago, Chicago, Illinois, United States of America.

Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.

出版信息

PLoS Comput Biol. 2022 Sep 16;18(9):e1010419. doi: 10.1371/journal.pcbi.1010419. eCollection 2022 Sep.

DOI:10.1371/journal.pcbi.1010419

PMID:36112715

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9518926/

Abstract

Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly. Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (TMRCA) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes. We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using TMRCA as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data.

摘要

揭示自然种群复杂的人口历史是群体遗传学的一个核心问题。了解过去的人口事件不仅具有普遍的人类学意义，而且在确定适应性或与疾病相关的遗传变异的准确零假设模型时，也是一个重要步骤。从基因组序列数据推断过去种群大小变化的一类重要工具是合并隐马尔可夫模型 (CHMM)。这些模型通过使用与采样个体相关的局部系统发育作为隐状态，在 HMM 框架中沿染色体演变，从而有效地利用了群体基因组数据集中的连锁信息。将这些模型扩展到较大的样本量是具有挑战性的，因为潜在状态的数量会迅速增加。在这里，我们提出了我们的方法 CHIMP（CHMM 历史推断最大似然过程），这是一种用于推断群体大小历史的新 CHMM 方法。它可以应用于大样本（数百个单倍型），并且只需要未相位基因组作为输入。我们在这里提出的两种 CHIMP 实现分别使用系统发育树的高度（TMRCA）或总分支长度作为基因组中每个位置的潜在变量。所需的转移和发射概率是通过数值求解从具有重组的祖先过程得出的某些微分方程系统获得的。随后使用期望最大化算法推断群体大小历史的参数。此外，我们实现了一种复合似然方案，以允许该方法扩展到较大的样本量。我们使用模拟数据在各种基准测试中证明了我们方法的效率和准确性，并与其他最先进的方法进行了比较。具体来说，我们使用 TMRCA 作为潜在变量的实现提供了可比的性能，并在中古代提供了有效种群大小的准确估计。我们的方法与数据的相位无关，这使其成为高质量数据不可用情况下的有前途的替代方法，并且在伪单倍体数据中有潜在的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faf5/9518926/116fbc278aa6/pcbi.1010419.g001.jpg

相似文献

Robust inference of population size histories from genomic sequencing data.从基因组测序数据中推断种群规模历史。

PLoS Comput Biol. 2022 Sep 16;18(9):e1010419. doi: 10.1371/journal.pcbi.1010419. eCollection 2022 Sep.

Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach.从多个基因组估计可变有效种群大小：一种顺序马尔可夫条件抽样分布方法。

Genetics. 2013 Jul;194(3):647-62. doi: 10.1534/genetics.112.149096. Epub 2013 Apr 22.

Variational inference using approximate likelihood under the coalescent with recombination.使用重组下合并近似似然的变分推断。

Genome Res. 2021 Nov;31(11):2107-2119. doi: 10.1101/gr.273631.120. Epub 2021 Aug 23.

The Promise of Inferring the Past Using the Ancestral Recombination Graph.利用祖先重组图谱推断过去的可能性。

Genome Biol Evol. 2024 Feb 1;16(2). doi: 10.1093/gbe/evae005.

Demographic inference from multiple whole genomes using a particle filter for continuous Markov jump processes.利用连续马尔可夫跳跃过程的粒子滤波器进行多个全基因组的人口推断。

PLoS One. 2021 Mar 2;16(3):e0247647. doi: 10.1371/journal.pone.0247647. eCollection 2021.

Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model.从合并隐马尔可夫模型推断人类、黑猩猩和大猩猩的基因组关系及物种形成时间。

PLoS Genet. 2007 Feb 23;3(2):e7. doi: 10.1371/journal.pgen.0030007. Epub 2006 Nov 30.

Ancestral population genomics using coalescence hidden Markov models and heuristic optimisation algorithms.使用合并隐马尔可夫模型和启发式优化算法的祖先群体基因组学。

Comput Biol Chem. 2015 Aug;57:80-92. doi: 10.1016/j.compbiolchem.2015.02.001. Epub 2015 Mar 5.

Bayesian Nonparametric Inference of Population Size Changes from Sequential Genealogies.基于连续系谱的种群大小变化的贝叶斯非参数推断

Genetics. 2015 Sep;201(1):281-304. doi: 10.1534/genetics.115.177980. Epub 2015 Jul 28.

Inferring whole-genome histories in large population datasets.在大型人群数据集推断全基因组历史。

Nat Genet. 2019 Sep;51(9):1330-1338. doi: 10.1038/s41588-019-0483-y. Epub 2019 Sep 2.

An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination.具有重组的合并的精确顺序马尔可夫条件抽样分布。

Genetics. 2011 Apr;187(4):1115-28. doi: 10.1534/genetics.110.125534. Epub 2011 Jan 26.

引用本文的文献

The TMRCA of general genealogies in populations with deterministically varying size.大小确定性变化人群中一般谱系的最近共同祖先时间

Theor Popul Biol. 2025 Jul 2;165:1-9. doi: 10.1016/j.tpb.2025.06.002.

Computational Genomics and Its Applications to Anthropological Questions.计算基因组学及其在人类学问题中的应用。

Am J Biol Anthropol. 2024 Dec;186 Suppl 78(Suppl 78):e70010. doi: 10.1002/ajpa.70010.

Estimating evolutionary and demographic parameters via ARG-derived IBD.通过基于祖先重组图（ARG）推导的同源片段（IBD）估计进化和群体统计学参数。

PLoS Genet. 2025 Jan 8;21(1):e1011537. doi: 10.1371/journal.pgen.1011537. eCollection 2025 Jan.

The TMRCA of general genealogies in populations of variable size.大小可变群体中一般系谱的最近共同祖先时间。

bioRxiv. 2024 Sep 24:2024.09.19.613917. doi: 10.1101/2024.09.19.613917.

Improved inference of population histories by integrating genomic and epigenomic data.通过整合基因组和表观基因组数据来改进群体历史推断。

Elife. 2024 Sep 12;12:RP89470. doi: 10.7554/eLife.89470.

Biases in ARG-Based Inference of Historical Population Size in Populations Experiencing Selection.基于 ARG 的历史人口规模推断在经历选择的人群中的偏差。

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae118.

Haplotype-based inference of recent effective population size in modern and ancient DNA samples.基于单体型的现代和古代 DNA 样本中近期有效种群大小的推断。

Nat Commun. 2023 Dec 1;14(1):7945. doi: 10.1038/s41467-023-43522-6.

Joint inference of evolutionary transitions to self-fertilization and demographic history using whole-genome sequences.利用全基因组序列进行有性生殖到自交进化的联合推断和种群历史分析。

Elife. 2023 May 11;12:e82384. doi: 10.7554/eLife.82384.

本文引用的文献

Human generation times across the past 250,000 years.人类的世代跨越了过去的 25 万年。

Sci Adv. 2023 Jan 6;9(1):eabm7047. doi: 10.1126/sciadv.abm7047.

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.对扩展的 1000 基因组项目队列进行高覆盖率全基因组测序，包括 602 个三核苷酸重复序列。

Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.

Efficient ancestry and mutation simulation with msprime 1.0.利用 msprime 1.0 进行高效的祖先和突变模拟。

Genetics. 2022 Mar 3;220(3). doi: 10.1093/genetics/iyab229.

Inferring Population Histories for Ancient Genomes Using Genome-Wide Genealogies.利用全基因组谱系推断古代基因组的种群历史。

Mol Biol Evol. 2021 Aug 23;38(9):3497-3511. doi: 10.1093/molbev/msab174.

Limits and convergence properties of the sequentially Markovian coalescent.顺序马尔可夫凝聚的限制和收敛性质。

Mol Ecol Resour. 2021 Oct;21(7):2231-2248. doi: 10.1111/1755-0998.13416. Epub 2021 May 30.

A community-maintained standard library of population genetic models.一个社区维护的种群遗传模型标准库。

Elife. 2020 Jun 23;9:e54967. doi: 10.7554/eLife.54967.

Tracking human population structure through time from whole genome sequences.从全基因组序列追踪随时间变化的人类人口结构。

PLoS Genet. 2020 Mar 9;16(3):e1008552. doi: 10.1371/journal.pgen.1008552. eCollection 2020 Mar.

Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates.共识化：一种从古基因组数据集生成低错误率假单倍体基因组序列的方法。

Genes (Basel). 2020 Jan 2;11(1):50. doi: 10.3390/genes11010050.

An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.一种从 DNA 序列数据推断选择和等位基因频率轨迹的近似完全似然方法。

PLoS Genet. 2019 Sep 13;15(9):e1008384. doi: 10.1371/journal.pgen.1008384. eCollection 2019 Sep.

Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees.贝叶斯估计抽样 Tajima 树的种群大小变化。

Genetics. 2019 Nov;213(3):967-986. doi: 10.1534/genetics.119.302373. Epub 2019 Sep 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从基因组测序数据中推断种群规模历史。

Robust inference of population size histories from genomic sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献