Terhorst Jonathan
Department of Statistics, University of Michigan, Ann Arbor, MI, USA.
Nat Genet. 2025 Sep 15. doi: 10.1038/s41588-025-02323-x.
This study introduces population history learning by averaging sampled histories (PHLASH), a new method for inferring population history from whole-genome sequence data. It works by drawing random, low-dimensional projections of the coalescent intensity function from the posterior distribution of a pairwise sequentially Markovian coalescent-like model and averaging them together to form an accurate and adaptive estimator. On simulated data, PHLASH tends to be faster and have lower error than several competing methods, including SMC++, MSMC2 and FITCOAL. Moreover, it provides automatic uncertainty quantification and leads to new Bayesian testing procedures for detecting population structure and ancient bottlenecks. The key technical advance is a new algorithm for computing the score function (gradient of the log likelihood) of a coalescent hidden Markov model, which has the same computational cost as evaluating the log likelihood. PHLASH has been released as an easy-to-use Python software package and leverages graphics processing unit acceleration when available.
本研究介绍了通过平均采样历史来学习种群历史(PHLASH),这是一种从全基因组序列数据推断种群历史的新方法。它通过从成对顺序马尔可夫合并类模型的后验分布中绘制合并强度函数的随机低维投影,并将它们平均在一起,以形成一个准确且自适应的估计器。在模拟数据上,PHLASH往往比包括SMC++、MSMC2和FITCOAL在内的几种竞争方法更快且误差更低。此外,它提供了自动不确定性量化,并导致了用于检测种群结构和古代瓶颈的新贝叶斯检验程序。关键的技术进步是一种用于计算合并隐马尔可夫模型得分函数(对数似然梯度)的新算法,其计算成本与评估对数似然相同。PHLASH已作为一个易于使用的Python软件包发布,并在可用时利用图形处理单元加速。