Suppr超能文献

多群体联合样本频率谱的高效计算。

Efficient computation of the joint sample frequency spectra for multiple populations.

作者信息

Kamm John A, Terhorst Jonathan, Song Yun S

机构信息

Department of Statistics, University of California, Berkeley.

Departments of EECS, Statistics, and Integrative Biology, University of California, Berkeley.

出版信息

J Comput Graph Stat. 2017;26(1):182-194. doi: 10.1080/10618600.2016.1159212. Epub 2017 Feb 16.

Abstract

A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

摘要

群体遗传学中的大量研究都采用了样本频率谱(SFS),它是一种汇总统计量,描述了DNA序列样本中多态性位点处突变等位基因的分布,并能对大规模群体基因组变异数据进行高效的降维处理。最近,人们对分析来自多个群体的联合SFS数据以推断复杂人口历史参数产生了浓厚兴趣,这些参数包括可变的群体大小、群体分裂时间、迁移率、混合比例等等。基于SFS的推断方法需要在给定的人口模型下准确计算预期的SFS。尽管在方法上已经取得了很大进展,但当涉及多个群体且样本量较大时,现有方法存在数值不稳定性和高计算复杂性的问题。在本文中,我们提出了新的解析公式和算法,能够对从数百个通过具有任意群体大小历史(包括分段指数增长)的复杂人口模型相关的群体中抽取的数千个个体准确、高效地计算预期的联合SFS。我们的结果在一个名为 (用于推断的莫兰模型)的新软件包中得以实现。通过实证研究,我们展示了在数值稳定性和计算复杂性方面的改进。

相似文献

1
Efficient computation of the joint sample frequency spectra for multiple populations.多群体联合样本频率谱的高效计算。
J Comput Graph Stat. 2017;26(1):182-194. doi: 10.1080/10618600.2016.1159212. Epub 2017 Feb 16.

引用本文的文献

6
Leveraging graphical model techniques to study evolution on phylogenetic networks.利用图形模型技术研究系统发育网络上的进化。
Philos Trans R Soc Lond B Biol Sci. 2025 Feb 13;380(1919):20230310. doi: 10.1098/rstb.2023.0310. Epub 2025 Feb 20.
9
Exact Decoding of a Sequentially Markov Coalescent Model in Genetics.遗传学中顺序马尔可夫合并模型的精确解码
J Am Stat Assoc. 2024;119(547):2242-2255. doi: 10.1080/01621459.2023.2252570. Epub 2023 Oct 3.

本文引用的文献

5
Neutral genomic regions refine models of recent rapid human population growth.中性基因组区域能完善近期人类快速增长的模型。
Proc Natl Acad Sci U S A. 2014 Jan 14;111(2):757-62. doi: 10.1073/pnas.1310398110. Epub 2013 Dec 30.
7
Robust demographic inference from genomic and SNP data.基于基因组和单核苷酸多态性数据的可靠人口统计学推断。
PLoS Genet. 2013 Oct;9(10):e1003905. doi: 10.1371/journal.pgen.1003905. Epub 2013 Oct 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验